You are on page 1of 9

Semi-supervised Domain Adaptive Retrieval via

Discriminative Hashing Learning

Haifeng Xia† , Taotao Jing† , Chen Chen♯ , Zhengming Ding†


† Department of Computer Science, Tulane University, USA
♯ Center for Research in Computer Vision, University of Central Florida, USA
{hxia,tjing,zding1}@tulane.edu
chen.chen@crcv.ucf.edu
ABSTRACT
Source Source
Domain adaptive image retrieval (DAR) aims to train the model with
well-labeled source domain and target images in order to retrieve
source instances given query target samples from the identical cat-
egory space. However, the practical scenario hinders to manually
annotate all retrieved images due to huge labeling cost. Motivated
by the realistic demand, we firstly define the semi-supervised do-
main adaptive retrieval (SDAR) problem, assuming the database (a) Target (b) Target
includes a small proportion annotated source images and abundant Labeled Source Unlabeled Target
unlabeled ones. To overcome the challenging SDAR, this paper Unlabeled Source Query Target
propose a novel method named Discriminative Hashing learning
(DHLing) which mainly includes two modules, i.e., domain-specific Figure 1: Problem definition of (a) DAR and (b) SDAR, where
optimization and domain-invariant memory bank. Specifically, the DAR assumes to access the well-labeled retrieval source do-
first component explores the structural knowledge of samples to main and unlabeled target images sampled from the same
predict the unlabeled images with pseudo labels to achieve hash distribution with the query samples during the training
coding consistency. While, the second one attempts to construct stage. While, SDAR supposes the source domain includes
the domain-invariant memory bank to guide the feature genera- a small proportion labeled source instances while a great
tion and achieve cross-domain alignment. Experimental results on number of unlabeled ones.
several popular cross-domain retrieval benchmarks illustrate the
effectiveness of our proposed DHLing on both conventional DAR
and new SDAR scenarios by comparing with the state-of-the-art 1 INTRODUCTION
retrieval methods. Image retrieval recently attracts more attention in multimedia com-
munity [11, 47]. For the online shopping scenario, customers always
CCS CONCEPTS upload their interested images and query the corresponding prod-
• Information systems → Image search; • Theory of compu- uct from platform by matching it with ones in the retrieval database.
tation → Semi-supervised learning. Existing researches [7, 52] typically assume the query images from
customer belong to the identical distribution with the retrieved
KEYWORDS images. However, abundant environmental factors such as illumi-
nation and background easily result in the significant difference
Semi-supervised Learning, Cross-domain Retrieval
(domain shift) among images with the same semantic information
ACM Reference Format: [19, 44]. Under such a practical application, image retrieval problem
Haifeng Xia† , Taotao Jing† , Chen Chen♯ , Zhengming Ding† . 2021. Semi- becomes challenging for strategies [10, 36] that rely on the identical
supervised Domain Adaptive Retrieval via Discriminative Hashing Learning. or similar distribution assumption to accurately provide clients the
In Proceedings of the 29th ACM International Conference on Multimedia (MM product image of interest. To overcome the challenge, [13] attempts
’21), October 20–24, 2021, Virtual Event, China. ACM, New York, NY, USA, to eliminate the domain discrepancy by combining the attribute
9 pages. https://doi.org/10.1145/3474085.3475526
information with visual semantic similarity. Although this scheme
effectively improves the retrieval precision, the adoption of attribu-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed tion introduces additional computational cost.
for profit or commercial advantage and that copies bear this notice and the full citation To overcome the bottleneck, [12] introduces the domain adaptive
on the first page. Copyrights for components of this work owned by others than ACM retrieval (DAR) by considering the well-labeled retrieval database
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a as source domain. In addition, DAR has access to additional un-
fee. Request permissions from permissions@acm.org. labeled images of target domain in the training stage, as Figure
MM ’21, October 20–24, 2021, Virtual Event, China 1 (a) shows. It is worth noting that target images and the unseen
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8651-7/21/10. . . $15.00 queries are sampled from the same distribution. Under this condi-
https://doi.org/10.1145/3474085.3475526 tion, DAR exploits the supervision of source annotation to train a
model with more discriminative ability and then adapts the model to DAR and SDAR problems. The quantitative and qualitative
target domain by mitigating the cross-domain discrepancy. Finally, analysis explicitly illustrates the contribution of each module
DAR matches the given queries with the available source images, on improving the retrieval ability of our DHLing model.
resulting in state-of-the-art performance in cross-domain image
retrieval. However, annotating the whole retrieval database becomes 2 RELATED WORK
time-consuming and expensive for realistic retrieval applications.
Moreover, to accelerate the retrieval speed and reduce the storage
2.1 Single-domain Retrieval
space, many methods [8, 23, 26, 31] explore the hash coding to The single-domain image retrieval refers to the content-based vi-
transform the continuous real-value feature into the binary code sual information matching and assumes all images belonging to
and calculate the Hamming distance of arbitrary two instances as the identical distribution [1, 43, 47]. Given the query images, the
their similarity [5, 24, 33, 40, 49]. Since the hash operation removes system searches the semantically similar ones from the database
abundant semantic information from the original representation, by calculating the similarity between the query and each retrieved
the binary code mechanism slightly reduces the negative influence image [25, 32, 35, 39]. To improve the retrieval efficacy, methods
of domain shift and promotes the retrieval ability of model. [15, 31, 46] typically adopt hash algorithm to obtain the binary
In this paper, we consider a novel application scenario named as code from the continuous real-value feature. Although they have
Semi-supervised Domain Adaptive Retrieval (SDAR) illustrated as achieved the comparable performance on standard benchmarks,
Figure 1 (b), where insufficient source annotation heavily increases their application easily suffers from retrieval precision degradation
the difficulty of learning discriminative feature and achieving do- in the practical scenario where there exists significant difference
main adaptation. Consequently, to address SDAR challenge, we between the query images and the retrieved ones due to various cap-
develop the discriminative hashing learning (DHLing) consisting turing conditions (illumination, background). The realistic demand
of two important modules, i.e., domain-specific optimization and naturally stimulates the exploration of domain adaptive retrieval.
domain-invariant memory bank. The first component of DHLing
mainly remedies the effect of absent annotation and facilitates the 2.2 Domain Adaptive Retrieval
model to learn discriminative representations. Concretely, we em- The earlier explorations on cross-domain retrieval [13, 15, 17, 22, 51]
ploy the labeled source images to learn the initial class centers constrain the application scenario on online shopping and organize
and adaptively update them to form domain-specific centers for the training set with paired user-product images. In addition, they
unlabeled source and target instances. According to the spherical also need the additional attribute information of instances to pro-
k-means [2], the unlabeled training samples are automatically as- mote the retrieval precision. However, it is expensive to collect the
signed their labels by measuring the distance to the class centers. cross-domain paired images in reality so that such an assumption
Benefiting from the pseudo label, DHLing constructs the semantic heavily impedes the application of these methods. Thus, [12] further
similarity and learns the consistent binary codes for semantically modifies the cross-domain retrieval as domain adaptive retrieval
similar samples by achieving the agreement between semantic sim- (DAR), where one can train the model with well-labeled source
ilarity and the inner product of hash codes. However, hash code domain and unlabeled target images in order to retrieve source
easily neglects the meaningful semantic so that it becomes sensi- instances given the query target samples. These two domains share
tive to the distortion over the original hidden feature. Thus, the the identical category space but belong to different distributions
second module aims to early achieve cross-domain feature align- [45, 48]. Although [12] has achieved the state-of-the-art perfor-
ment before the hashing operation to improve the robustness of mance, it manually annotates abundant source images to train the
model. With the latent features, we firstly adopt dictionary learning model, which becomes time-consuming and not affordable.
[34, 50] to construct the domain-specific memory banks and then Furthermore, we are first to consider the semi-supervised do-
fuse them to seek domain-invariant memory bank which guides main adaptive retrieval (SDAR), where the source domain includes
next feature generation to eliminate the cross-domain discrepancy. a small proportion of annotated images and other unlabeled ones.
The main contributions of our work are summarized as three folds: Due to the absence of many source annotation, SDAR significantly
increases the difficulty of learning discriminative feature and elimi-
• From the practical application demand, we introduce a novel nating domain shift.
image retrieval problem named Semi-supervised Domain Adap-
tive Retrieval (SDAR), and propose a novel discriminative 3 THE PROPOSED METHOD
hashing learning (DHLing) with two modules for effective
cross-domain image retrieval. 3.1 Motivation and Problem Definition
• To mitigate the label insufficiency, the first domain-specific The typical domain adaptive retrieval (DAR) [12] aims to train
optimization module explores the structural knowledge of the system with well-labeled retrieval database (source domain)
samples to automatically annotate the unlabeled instances and unlabeled target domain. Due to the domain shift caused by
and further achieve coding consistency. To reduce the do- varying light conditions, occlusions, and devices, the mentioned
main shift, the second one based on dictionary learning seeks two domains belong to different distributions. For the application,
the domain-invariant memory bank to align cross-domain the trained model selects instances from source database accord-
feature distributions. ing to the unseen queries following the same distributions as the
• Extensive experimental results on several cross-domain bench- target domain. However, the collection of annotated source sam-
marks fully verify the effectiveness of our DHLing on solving ples generally becomes time-consuming and expensive. To meet
Source Memory Bank Domain Specific Clustering
Labeled

𝑫𝒔
f 𝒔&
Unlabeled

Feature Extractor
𝑦#" Semantic
f 𝒔' 𝑊!
Similarity
𝑊) 𝑊# 𝑊!
𝐒𝐢𝐠𝐧(⋅)
Unlabeled

𝐟𝐭
𝑫𝒕
Target Memory Bank Consistent Coding
Figure 2: Overview of the proposed method (DHLing) including two modules: 1) domain-specific optimization which adopts
spherical k-means clustering to annotate the unlabeled training samples and further to build consistent coding constraint;
and 2) domain-invariant memory bank which firstly learns the two domain-specific dictionaries 𝐷𝑠/𝑡 from f𝑖 and transmits
the knowledge into the domain-invariant memory bank W𝑔 .

the more realistic demand, we propose a novel problem named Moreover, inspired by the success of memory bank, DHLing
Semi-supervised Domain Adaptive Retrieval (SDAR) where only a adopts dictionary structure to construct the memory bank which
small portion of source samples are provided with the correspond- facilitates the domain-invariant feature learning. The specific oper-
ing category information. The main challenges of SDAR mainly ations for each component of DHLing are elaborated as follows.
include two folds. Parallel to conventional DAR, the first important
challenge of SDAR is how to mitigate the negative influence of 3.3 Domain-Specific Optimization
cross-domain discrepancy on learning semantic-wise representa- Before introducing the details of each module, we briefly illustrate
tions. On the other hand, how to learn discriminative features with the network architecture of DHLing. For simplicity, the symbols of
insufficient source label knowledge is another difficulty of the new domain and class are neglected. Given the instances x𝑖 flowing in
problem setting, through the comparison between DAR and SDAR. the network, it is straightforward to obtain the hidden features of
Formally, SDAR assumes that the retrieved database consists of each layer as follows:
two parts: a small portion of well-labeled source instances D𝑠𝑙 = (
𝑛𝑙 f𝑖 = 𝜎 (W⊤ x + b̄), g𝑖 = 𝜎 (W𝑔⊤ f𝑖 ),
𝑓 𝑖
{(x𝑙𝑠𝑖 , y𝑠𝑖 )|x𝑙𝑠𝑖 ∈ R𝑑 , y𝑠𝑖 ∈ R𝑐 }𝑖=1
𝑠
and a large amount of unlabeled (1)

h𝑖 = Wℎ g𝑖 , ŷ𝑖 = W𝑐⊤ h𝑖 ,
𝑛𝑢
ones D𝑠𝑢 = {x𝑢𝑠𝑖 |x𝑢𝑠𝑖 ∈ R𝑑 }𝑖=1𝑠
where 𝑐 denotes the number of
categories and 𝑦𝑠𝑖 is the annotation corresponding to x𝑙𝑠𝑖 . In ad- where W 𝑓 , W𝑔 , Wℎ , W𝑐 and b̄ are the learnable parameters of net-
dition, the training stage has access to the target data samples work, 𝜎 (·) denotes the activation function and ŷ𝑖 is the prediction
D𝑡 = {x𝑡𝑖 |x𝑡𝑖 ∈ R𝑑 }𝑛𝑖=1
𝑡
from the same distribution with the unseen only used by labeled source instances. Moreover, the binary hash
queries. In terms of the final application, we explore the retrieval representation is directly transformed from h𝑖 , i.e., b𝑖 = 𝑠𝑖𝑔𝑛(h𝑖 ) ∈
system to search samples from D𝑠𝑙 and D𝑠𝑢 with the similar seman- R𝐾 (𝐾 is the length of the hash code).
tic information as the given queries. Consistent Coding Optimization. The primary goal of this mod-
ule is to achieve coding consistency for semantically similar sam-
ples via the exploration of sample-wise association. To explicitly
3.2 Framework Overview describe the relationship of samples, we firstly define the semantic
Considering the retrieval speed requirement, we propose a simple similarity s𝑖 𝑗 between x𝑖 and x 𝑗 , where s𝑖 𝑗 = 1 means these two
yet effective approach named Discriminative Hashing Learning samples have the similar semantic knowledge, otherwise s𝑖 𝑗 = 0.
(DHLing) to overcome the above challenges. As shown in Figure 2, Intuitively, two instances are annotated with the same label, which
our DHLing includes two modules, i.e., domain-specific optimiza- is also equivalent to s𝑖 𝑗 = 1.
tion and domain-invariant memory bank construction. From another perspective, SDAR mainly attempts to learn the
First of all, our method follows [12] to extract high-level semantic unified binary code for source and target samples. For the hash
coding, the Hamming distance defined as 𝑑 (x𝑖 , x 𝑗 ) = 12 (𝐾 −⟨b𝑖 , b 𝑗 ⟩)
𝑙/𝑢
representations x𝑠𝑖 and x𝑡𝑖 from the raw images via the fixed
backbones such as VGGNet [38] and AlexNet [20]. Differently, our is usually used to measure the distance between two instances,
work develops domain-specific optimization module to not only where ⟨b𝑖 , b 𝑗 ⟩ is the inner product. Based on the above definition,
explore the intrinsic category knowledge for all unlabeled instances the similarity of two samples can also be evaluated by their inner
but also attempt to achieve the coding consistency. product over the hash codes, i.e., ⟨b𝑖 , b 𝑗 ⟩. Therefore, it is clear that
𝑙/𝑢
the larger inner product of two binary codes indicates they have the where ŝ𝑖 𝑗 = 1 when x𝑠𝑖 and x𝑡𝑖 belong to the same category,
similar semantic information with a higher probability, formulated otherwise ŝ𝑖 𝑗 = 0, and 𝜙𝑖𝑠𝑡𝑗 = 12 ⟨b𝑠𝑖 , b𝑡𝑖 ⟩. On the other hand, Eq.
𝑙/𝑢
as Gaussian distributions: (3) is also applied in the exploration of intra-domain relationship.
𝛿 (𝜙𝑖 𝑗 ), s𝑖 𝑗 = 1

Thus the loss function of target domain is rewritten as:
𝑝 (s𝑖 𝑗 |b𝑖 , b 𝑗 ) = (2)
1 − 𝛿 (𝜙𝑖 𝑗 ), s𝑖 𝑗 = 0 𝑡
min L𝑐𝑐 = −s𝑖𝑡 𝑗 log(𝛿 (𝜙𝑖𝑡 𝑗 )) − (1 − s𝑖𝑡 𝑗 ) log(1 − 𝛿 (𝜙𝑖𝑡 𝑗 )). (9)
1
where 𝜙𝑖 𝑗 = 12 ⟨b𝑖 , b 𝑗 ⟩ and 𝛿 (𝜙𝑖 𝑗 ) = . With the su- This constraint further improves the discriminative ability of
1 + exp(−𝜙𝑖 𝑗 )
pervision of semantic similarity s𝑖 𝑗 , we control the consistency binary codes by learning more compact category subspace. Thus,
of binary codes by adjusting their inner product. Thus, the loss we combine these constraints to form the final coding consistency
function of coding consistency is written as: objective function:
𝑠𝑡 𝑠 𝑡
min L𝑐𝑐 = −s𝑖 𝑗 log(𝛿 (𝜙𝑖 𝑗 )) − (1 − s𝑖 𝑗 ) log(1 − 𝛿 (𝜙𝑖 𝑗 )). (3) min L𝑐𝑐 = L𝑐𝑐 + L𝑐𝑐 + L𝑐𝑐 . (10)
Along with the optimization of Eq. (3), the system gradually
3.4 Domain-Invariant Memory Bank
promotes the consistency between 𝛿 (𝜙𝑖 𝑗 ) and s𝑖 𝑗 . However, the
insufficient labeled source instances in SDAR heavily affect the Considering the hash retrieval, the domain-specific optimization
employment of coding consistency loss. Thereby, our model further module directly attempts to access to the semantic similarity and
develops domain-specific clustering to mitigate the contradiction. employs it to learn the consistent binary code. However, the hash
code b𝑖 becomes sensitive to the distortion of h𝑖 , i.e., b𝑖 𝑗 = 0(h𝑖 𝑗 <
Domain-Specific Clustering. The intuitive strategy is generating 0) → b𝑖 𝑗 = 1(h𝑖 𝑗 + △ > 0, |△| < 𝜖), where 𝜖 is a small positive
the pseudo label from the hidden features for unlabeled source/target scalar to avoid trivial solution.
samples to effectively avoid the absence of annotation. However, To improve the robustness of system, our method DHLing alter-
the derived annotations from the latent layer are also employed to natively aims to learn domain-invariant representations over the
optimize the feature learning. Considering the coupled relationship, hidden features before the hash codes. Focusing on the network de-
we conduct these two operations in an iterative fashion. For each sign, the second fully-connection layer g𝑖 = 𝜎 (W𝑔⊤ f𝑖 ) omits the bias
epoch, the fixed network takes all available samples as input to
term. When regarding each column vector of W𝑔 as the individual
synthesize the hidden features h𝑖 used to infer the label with the
atom, another way of understanding it is that g𝑖 is the projection
spherical k-means [2]. Then the instances with ground-truth or
of f𝑖 over the basis vector of space. Thus, we also define W𝑔 as the
pseudo labels take participant in the optimization of model.
memory bank storing the essential attribution of all instances.
In details, the generation of annotation consists of three steps.
However, the cross-domain discrepancy easily results in the sig-
First, the features h𝑙𝑠𝑖 of labeled source instances are used to com- nificant difference across source and target feature distributions.
pute each class center with: Learning the projection with the same memory bank becomes un-
𝑙 𝑙/𝑢
1 Õ
𝑛𝑠
h𝑙𝑠𝑖 reasonable and unfair for f𝑠𝑖 and f𝑡𝑖 . To avoid the problem, source
𝑂𝑐𝑠𝑙 = I(y𝑠𝑖 = 𝑐) , (4) and target domains independently learn their individual memory
𝑁𝑐 𝑖=1 ∥h𝑙𝑠 ∥𝑖 bank and transmit them into W𝑔 to form the domain-invariant
where 𝑂𝑐𝑠𝑙 and 𝑁𝑐 denote the feature center and the number of atoms. Concretely, the first important point is to discover the basis
samples of the 𝑐-th category respectively, and I(·) is the indicator vector from source and target domains, respectively. Motivated
function. Second, we assign annotation to unlabeled source samples by the dictionary learning [34, 50] where each data point can be
! represented by the sparse combination of atoms, DHLing gradually
1 ⟨h𝑢𝑠𝑖 , 𝑂𝑐𝑠𝑙 ⟩ learns the domain-specific memory bank from the original hidden
𝑢 ˆ 𝑢 𝑠𝑙
ŷ𝑠𝑖 = arg min 𝑑 (h𝑠𝑖 , 𝑂𝑐 ) = 1 − . (5)
𝑐 2 ∥h𝑢𝑠𝑖 ∥∥𝑂𝑐𝑠𝑙 ∥ features f𝑖 . Take the target subspace as an example, the learning
To improve the annotation accuracy, we update the initial class process is to minimize the below formulation:
centers via the following formulation and reassign the label to x𝑢𝑠𝑖 : 𝑛𝑡
Õ
L𝑑𝑡 = ∥f𝑡𝑖 − 𝐷𝑡 𝛼𝑖𝑡 ∥ 22 + ∥𝛼𝑖𝑡 ∥ 1, (11)
𝑛𝑙𝑠 𝑛𝑢𝑠
1 Õ h𝑙𝑠𝑖 1 Õ h𝑢𝑠𝑖 𝑖=1
𝑂𝑐𝑠𝑙 = I(y𝑠𝑖 = 𝑐) + 𝑢 I( ŷ𝑢𝑠𝑖 = 𝑐) 𝑢 . (6)
𝑁𝑐 𝑖=1 ∥h𝑙𝑠𝑖 ∥ 𝑁𝑐 𝑖=1 ∥h𝑠𝑖 ∥ where 𝐷𝑡 is set as the same size with W𝑔 and ∥ · ∥ 1 represents
the 𝐿1-norm regularization which uses as few atoms as possible
Similarly, in the third step, the target centers are initialized with to reconstruct the original feature representation f𝑡𝑖 . Similarly, the
𝑐 and then we iteratively allocate labels to target samples and
𝑂𝑠𝑙 source memory bank is induced from minimizing the following
update the class centers with: objective with labeled and unlabeled source instances:
𝑛𝑡  h 
1 Õ 𝑡𝑖 𝑙 𝑢
𝑂𝑐𝑡 = 𝑡 I( ŷ𝑡𝑖 = 𝑐) . (7) 𝑛𝑠
Õ 𝑛𝑠
Õ
𝑁𝑐 𝑖=1 ∥h𝑡𝑖 ∥ L𝑑𝑠 = ∥f𝑠𝑙𝑖 −𝐷𝑠 𝛼𝑖𝑠𝑙 ∥ 22 +∥𝛼𝑖𝑠𝑙 ∥ 1 + ∥f𝑠𝑢𝑖 −𝐷𝑠 𝛼𝑖𝑢𝑙 ∥ 22 +∥𝛼𝑖𝑢𝑙 ∥ 1 . (12)
𝑖=1 𝑖=1
To this end, we easily determine the semantic similarity accord-
ing to their assigned category information. Thanks to the pseudo With the fixed source and target dictionaries, the domain-invariant
label, the cross-domain coding consistency is easily formulated as: memory bank derives from their fusion by minimizing:
𝑠𝑡
min L𝑐𝑐 = −ŝ𝑖 𝑗 log(𝛿 (𝜙𝑖𝑠𝑡𝑗 )) − (1 − ŝ𝑖 𝑗 ) log(1 − 𝛿 (𝜙𝑖𝑠𝑡𝑗 )), (8) L𝑚 = ∥W𝑔 − 𝐷𝑠 ∥ 2𝐹 + ∥W𝑔 − 𝐷𝑡 ∥ 2𝐹 , (13)
where ∥ · ∥ 𝐹 is the Frobenius Norm. Since W𝑔 seeks for the bal- VLCS, each image is transformed into a 4096-dimensional feature
ance between source and target memory bank, the domain shift is vector by using the CNN-based network [6]. In the retrieval experi-
effectively eliminated via such transformation from g𝑖 to h𝑖 . ments, the Caltech256 dataset is considered as the source domain,
while the ImageNet is adopted as target domain.
3.5 Overall Objective Function The Office-Home dataset is a more challenging benchmark for
Although the retrieval system based on hash coding accelerates cross-domain retrieval problem and consists of 15,500 images from
the matching speed, it abandons abundant meaningful semantic four domains: Art (A), Clipart (C), Product (P) and RealWorld (R).
knowledge when compared to the original continuous real values There are 65 categories across each domain, increasing the diffi-
hidden feature. To offset the gap, DHLing additionally minimizes culty of image classification. In addition, the considerable domain
the difference of h𝑖 and b𝑖 via the quantization loss: discrepancy heavily obstructs the adaptive retrieval. For example,
𝑙
𝑛𝑠 𝑢
𝑛𝑠
the artistic painting of A are significantly different from the real
𝑛𝑡
Õ Õ Õ images of R captured by the regular cameras. According to [12],
L𝑞 = ∥b𝑙𝑠𝑖 − h𝑙𝑠𝑖 ∥ 2 + ∥b𝑢𝑠𝑖 − h𝑢𝑠𝑖 ∥ 2 + ∥b𝑡𝑖 − h𝑡𝑖 ∥ 2 . (14)
we convert each original image into a 4096-dimensional feature by
𝑖=1 𝑖=1 𝑖=1
using VGG-16 network [38].
Similar to the existing supervised retrieval methods [12, 37], we
Baselines. Due to the application of the binary code in our DHLing,
also explore the classification loss over labeled source samples to
we evaluate the performance of DHLing by comparing with the
learn more discriminative features as:
𝑙
state-of-the-art hashing methods including PWCF [12], KSH [29],
𝑛𝑠
Õ SDH [37], ITQ [9], LapITQ+ [53], SGH [16], DSH [18], ITQ+ [53],
L𝑐 = − y𝑠 log( ŷ𝑙𝑠𝑖 ) (15) GTH [28], OCH [27], LSH [3], SH [42], NoTL [12]. Specificially,
𝑖=1
for the two supervised methods (i.e., SDH, KSH), only the labeled
To sum up, the overall objective function for DHLing is formu- source instances of D𝑠𝑙 are used as training samples. While for
lated as follows: other left unsupervised methods and PWCF, all available source
 min


 L𝑐 + 𝜆L𝑐𝑐 + 𝛽L𝑞 + 𝛾 L𝑚 and target instances are used in the training stage.
𝜃
𝑠 𝑡 Implementation details. To verify the effectiveness of DHLing
 min ∗ L𝑑 + L𝑑

 𝐷𝑠 ,𝐷𝑡 ,𝛼 for cross-domain retrieval, this paper designs two experimental
where 𝜃 = {W 𝑓 , W𝑔 , Wℎ , W𝑐 , b̄}, ∗ ∈ {𝑡, 𝑠𝑙, 𝑠𝑢}, and 𝜆, 𝛽, 𝛾 are manners. First, for the conventional DAR setting, we provide all
trade-off parameters. We iteratively optimize two sub-objective source samples with its corresponding annotation for model train-
functions until the model reaches convergence. First, given net- ing. Second, for the semi-supervised DAR, instances of source do-
work parameters 𝜃 fixed for each epoch, the hidden features are main are divided into two subsets. One consists of samples with
employed to assign the annotation for unlabeled instances and learn its category information and the other one is without any label.
the domain-specific memory banks via min ∗ L𝑑𝑠 + L𝑑𝑡 . Due to In addition, we set three different label ratios (10%, 30% and 60%),
𝐷𝑠 ,𝐷𝑡 ,𝛼 which means the proportion of the labeled source samples over the
the sparse regularization, we adopt iterative manner to gradually whole retrieved database.
update the dictionary and the corresponding coefficient. Second, With respect to the organization of training and test sets, we
within each epoch, we update the whole network via minimizing follow the protocol of [12] to randomly select 500 images from
L𝑐 + 𝜆L𝑐𝑐 + 𝛽L𝑞 +𝛾 L𝑚 , by fixing the memory banks (𝐷𝑠 , 𝐷𝑡 ) and target domain as queries (test set) and consider the remaining ones
pseudo labels. together with all source instances as training set. From the quantita-
tive evaluation, we adopt the widely-used mean average precision
4 EXPERIMENTS (MAP) to make fair comparisons. For the qualitative measurement,
4.1 Experimental Setting the precision and recall curves and feature visualization are used to
Datasets. The MNIST [21] and USPS [14] are the popular bench- illustrate the performance of approaches. In terms of the trade-off
marks for handwritten digits recognition including ten categories parameters of our method, we set them as 𝜆 = 0.1, 𝛽 = 0.01, 𝛾 = 0.1
from 0 to 9. We follow [30] to resize each original image into 16×16 by default for all experiments. In fact, our model performance is
and separately consider MNIST and USPS as the source and target not sensitive to those parameters.
domains to perform the cross-domain retrieval tasks.
The VLCS [41] collects images from four sets: Caltech101, LabelMe, 4.2 Quantitative Results
Pascal VOC2007 and SUN09, sharing the identical label space with For the traditional DAR setting, tables 1 and 2 report the com-
five classes. According to the protocol [12], we also adopt the clas- parisons of performance between our DHLing and other hashing
sical network architecture [6] to extract the features with 4,096 methods. We easily achieve four conclusions by analysing them.
dimensions per raw image. In the experiment, the VOC2007 with First, our DHLing outperforms other baselines by a large margin
3,376 images is regarded as the retrieved database, while the Cal- for most cases. Specifically, with 64-bit binary code, DHLing sur-
tech101 serves as the target domain, which contains 1,415 images. passes the second best PWCF by 6.5% in terms of MAP for the
The Cross-dataset TestBed [4] includes three domains: Caltech256 MNIST&USPS task, which illustrates that our method effectively
(3,847 images), ImageNet (4,000 images) and SUN (2,626 images), solves the cross-domain retrieval by mitigating the significant do-
which are captured from different environments and the images main discrepancy. Moreover, although PWCF adopts the triplet loss
of each domain cover 40 categories. Similar with the operation in constraint to capture the relationship of cross-domain instances,
Table 1: Comparisons of MAP under DAR for MNIST&USPS, VOC2007&Caltech, and Caltech256&ImageNet benchmarks with
the varying coding bit from 16 to 128. The best MAP with the identical code length is highlighted with bold type.
MNIST&USPS VOC2007&Caltech101 Caltech256&ImageNet
Bit 16 32 48 64 96 128 16 32 48 64 96 128 16 32 48 64 96 128
SH 15.71 13.85 12.05 11.78 11.38 11.78 29.94 30.26 32.51 33.76 32.59 33.03 10.37 11.67 12.17 11.88 12.67 12.89
LSH 16.25 16.99 23.23 20.38 19.70 26.98 33.40 33.99 34.03 32.89 34.12 34.50 5.36 6.72 10.39 12.71 15.6 17.08
OCH 18.94 25.73 26.73 26.34 27.88 29.22 71.50 72.27 72.65 72.71 69.17 68.91 11.56 15.36 17.49 20.18 22.00 22.90
GTH 19.10 24.17 24.27 24.38 23.64 29.36 36.70 38.95 37.23 37.87 37.7 38.36 11.56 14.79 16.97 19.53 20.88 22.38
ITQ+ 20.27 20.53 16.77 15.87 17.79 14.90 35.35 34.48 34.33 34.42 34.05 34.74 - - - - - -
DSH 21.15 27.53 29.71 26.13 26.6 28.94 40.97 42.03 43.06 45.81 43.78 42.86 8.27 9.60 11.55 12.34 13.56 15.64
SGH 24.83 24.78 25.85 27.78 28.26 29.35 35.77 34.06 33.6 33.11 32.75 32.41 12.49 17.23 20.34 21.75 24.46 25.42
LapITQ+ 26.38 26.31 24.91 24.61 22.04 21.33 38.95 38.43 39.64 39.35 39.33 38.76 - - - - - -
ITQ 27.38 30.92 31.44 32.25 33.12 33.44 40.13 39.63 39.45 39.98 39.27 39.89 16.94 22.00 24.44 26.21 27.96 28.89
NoTL 28.13 30.05 28.24 30.34 31.76 31.72 35.95 37.86 38.28 38.49 38.67 38.97 15.10 19.77 22.8 24.39 26.07 27.28
SDH 29.98 43.02 42.57 46.56 42.4 48.12 67.6 65.75 68.58 65.06 65.66 67.03 18.05 25.71 26.23 26.38 26.77 26.29
KSH 43.75 46.91 50.02 47.43 45.25 46.81 74.74 76.05 76.71 76.70 76.22 73.14 20.34 12.07 26.77 32.83 35.28 34.49
PWCF 47.47 51.99 51.44 51.75 50.89 53.95 79.38 80.42 79.24 79.31 78.15 78.87 22.46 30.58 35.29 35.24 38.92 40.32
Ours 49.24 54.90 56.30 58.28 58.8 59.14 86.08 87.82 92.75 95.92 96.96 95.14 30.10 43.46 47.41 51.50 53.52 55.90

(a) PWCF (P→R) (b) Ours (P→R) (c) PWCF (MNIST&USPS) (d) Ours (MNIST&USPS)
Figure 3: Feature Visualization. The continuous real value features before hash operation are used to draw the feature distri-
bution. Red and Blue points represent
Table 2: Comparisons of mean average precision (MAP) of the increasing length of hash code, LapITO+ did not obtain higher
domain adaptive retrieval (DAR) on Office-Home with 64-bit precision for MNIST&USPS, our method, however, stably improves
binary code. The best MAP is highlighted with bold type. its performance from 49.24% (16 bits) to 59.14% (128 bits). The main
P→R R→P C→R R→C A→R R→A Avg reason lies in that our DHLing focuses on learning the consistent
hidden features before the binary code to improve the robustness
NoTL 25.02 29.09 16.98 14.73 27.38 21.05 22.38
of system, which helps our method avoid abandoning abundant
SH 15.01 16.82 10.38 8.32 14.52 12.41 12.91
semantic information when transforming the continuous real-value
LSH 10.74 14.78 9.85 8.37 12.24 10.02 11.00
representation into the hash code. Third, when compared with oth-
OCH 20.06 20.23 10.97 10.61 19.35 15.32 16.09
ers, DHLing becomes more suitable for the practical retrieval task
GTH 19.4 22.8 12.27 12.03 20.98 16.47 17.33
on large-scale dataset. As the aforementioned discussion, Office-
ITQ+ 17.94 - 10.61 - - 15.16 14.57
Home dataset is a challenging benchmark for cross-domain retrieval
DSH 8.86 7.72 6.44 6.15 10.07 9.11 8.06
since it involves abundant images and categories. Interestingly, our
SGH 24.12 26.47 16.1 14.14 22.82 20.41 20.68
DHLing still achieves the highest MAP on six tasks. In terms of the
LapITQ+ 15.94 - 11.72 - - 13.52 13.83
average result, DHLing beats PWCF with the second best perfor-
ITQ 26.32 29.13 17.6 15.88 26.86 21.99 22.96
mance by 9.5%, which demonstrates our method employs the con-
SDH 25.75 27.90 15.97 16.72 32.06 22.79 23.53
sistent coding optimization and domain-invariant memory bank to
KSH 32.02 34.42 21.56 18.51 25.87 20.04 25.40
eliminate the negative influence of domain shift on cross-domain re-
PWCF 34.03 34.44 24.22 18.42 34.57 28.95 29.11
trieval problem. Finally, the supervision of annotation significantly
Ours 48.47 45.24 30.81 25.15 43.30 38.68 38.61
enhances the identify ability of model. Concretely, the supervised
DHLing further explores the intra-domain knowledge to intensify approaches such as PWCF and SDH both achieve the comparable
feature representation with powerful discriminative ability. Second, even better precision when contrasting them with other unsuper-
with the varying coding bit from 16 to 64, DHLing extracts more vised methods (OCH, GTH). It explicitly emphasises the importance
meaningful knowledge from longer hash code to improve the pre- of annotation on learning discriminative features. However, the
cision and obtains the stable retrieval ability. For example, with
Query
Method Top-10 Retrieved Images (RealWorld)
Sample (Art)

Ours

Fan ✓ Fan ✓ Fan ✓ Fan ✓ Fan ✓ Fan ✓ Flower✘ Fan ✓ Lamp✘ Lamp✘

PWCF
Fan
Fan ✓ Fan ✓ Fan ✓ Fan ✓ Flower✘ Flower✘ Flower✘ Fan ✓ Lamp✘ Lamp✘

Ours

Computer ✓Computer✓Computer✓Computer✓Computer✓Computer✓ Laptop✘ Laptop✘ Laptop✘ Monitor✘

PWCF
Computer
Computer ✓Computer✓Computer✓Computer✓ Laptop✘ Laptop✘ Laptop✘ Monitor✘ Laptop✘ Calculator✘

Ours

Pencil ✓ Pencil ✓ Pencil ✓ Pencil ✓ Pencil ✓ Pencil ✓ Pen ✘ Pen ✘ Pen ✘ Pen ✘

PWCF
Pencil
Pencil ✓ Pencil ✓ Pencil ✓ Pencil ✓ Pencil ✓ Pen✘ Pen ✘ Pen ✘ Flower✘ Flipflops ✘

Ours

TV ✓ TV ✓ TV ✓ TV ✓ TV ✓ TV ✓ TV ✓ Monitor✘ Monitor✘ Monitor✘

TV PWCF

TV ✓ TV ✓ TV ✓ TV ✓ TV ✓ Monitor✘ Computer ✘ Monitor ✘ Radio ✘ Radio ✘

Figure 4: Examples of top-10 retrieval results given the query sample from the task R→A on Office-Home dataset.

practical application difficultly accesses the abundant well-labeled the mean average precision, our method both achieves the better
retrieved database during the training stage, which also motivates performance than others under DAR and SDAR setting.
us to propose the challenging yet realistic semi-supervised domain
adaptive retrieval (SDAR).
Table 3 shows the results of baselines and our method under 4.3 Empirical Analysis
SDAR setting with two various retrieval tasks and these three com- To provide the intuitive comparison between our DHLing and oth-
petitors are supervised methods. According to the observation of ers, we additionally report the feature embedding, the relationship
tables 1 and 3, it is simple to observe that all supervised methods between recall/precision and the number of retrieved instances and
suffer from the considerable performance degradation. Although the analysis of retrieved images.
our method also encounters with such a drop, DHLing effectively Feature visualization. Under DAR scenario, we firstly carry out
mitigates the effect of insufficient annotation in several cases. On the experiment on task P→R of Office-Home dataset with 64 bits
task Caltech256&ImageNet, when the proportion of labeled source and extract the hidden feature h𝑖 before the hash process to draw the
samples is changed from 100% to 10%, our method always fights feature embedding of the training instances from source and target
off others. The success of DHLing mainly results from the design domains shown in Figure 3 (b). Follow the same operation, Figure 3
of domain-specific clustering. This module explores the structural (a) shows the feature embedding obtained by PWCF. Through their
knowledge of samples to annotate the pseudo label, making the comparison, we notice that our DHLing achieves better alignment
consistent coding optimization more effective for aligning source across these two domains than PWCF, which is beneficial from the
and target domains. From the above comparison and analysis about consistent coding optimization narrowing the distance between
Table 3: Comparisons of MAP under SDAR for MNIST&USPS and Caltech256&ImageNet benchmarks with the varying coding
bit from 16 to 128 and various ratios (10%, 30% and 60%) of labeled source samples over the size of the whole retrieved database.
The best MAP for each situation is highlighted with bold type.
MNIST&USPS Caltech256&ImageNet
Ratio Bit 16 32 48 64 96 128 16 32 48 64 96 128
10% SDH 8.84 10.38 17.70 21.82 19.74 26.86 7.10 9.64 10.40 11.87 13.06 13.44
KSH 12.93 13.56 23.29 22.11 23.74 24.34 7.35 8.47 10.85 14.29 13.74 15.15
PWCF 16.38 18.88 25.94 28.34 31.05 32.83 8.69 12.38 13.35 15.47 17.81 19.16
Ours 35.29 45.67 45.29 47.07 46.17 46.50 10.01 15.32 17.06 16.16 19.55 21.22
30% SDH 15.55 24.64 32.88 38.31 36.86 40.63 7.43 11.06 11.29 12.73 14.51 15.82
KSH 25.67 29.75 40.29 38.93 40.03 39.17 7.96 8.66 12.59 15.95 16.31 16.28
PWCF 29.91 36.10 42.89 44.76 46.6 46.37 9.59 13.30 15.18 16.47 19.18 20.72
Ours 38.64 47.06 48.00 50.85 49.80 54.38 12.82 19.15 20.71 24.67 23.58 27.60
60% SDH 19.05 29.64 36.08 39.23 40.16 44.23 8.43 13.40 13.23 15.56 17.31 17.32
KSH 28.67 33.95 44.09 40.23 43.63 43.17 10.56 8.97 15.50 18.44 18.40 19.12
PWCF 33.31 39.93 47.15 45.56 49.13 49.97 12.66 16.90 18.28 18.94 22.66 23.55
Ours 41.58 48.46 53.30 55.48 55.31 55.80 21.25 30.98 33.79 35.02 38.94 41.29

0.6 0.70 DHLing DHLing-CC DHLing-M


NoTL
OCH
0.60
0.8
Precision @ 64 bits

0.45 KSH 0.50


Recall @ 64 bits

MAP
NoTL SDH
0.6 OCH GTH 0.40
0.3
0.4
KSH
SDH
LapITQ+
PWCF
0.30
GTH 0.2 Ours 0.20
LapITQ+
0.2
PWCF 0.1 0.10
Ours
0 0.00
0 200 400 600 800 1000 0 200 400 600 800 1000
The number of retrieved samples The number of retrieved samples 16 32 48 64 96 128
The length of hash code
(a) Recall (b) Precision
Figure 6: Comparison of our two variants and DHLing with
Figure 5: Comparison of retrieval ability in terms of (a) Re-
different code lengths.
call and (b) Precision under different size of retrieved sam-
ples for task P→R of Office-Home dataset with 64 bits. variants (i.e., DHLing-CC and DHLing-M) by removing the con-
sistent coding constraint L𝑐𝑐 and domain-invariant memory bank
arbitrary two samples from the identical class and expanding the
L𝑚 , respectively. Under DAR scenario, we perform experiments
distance of samples from various categories. Besides, for SDAR
on MNIST&USPS and report the corresponding results in Figure 6.
setting, Figure 3 (c) and (d) show the visualization of features from
Accordingly, we easily know that the consistent coding constraint
PWCF and our DHLing on MNIST&USPS task with 64 bits and 30%
makes more contribution to the model training. Moreover, without
labeled source instances. Similarly, our method learns more explicit
the design of domain-invariant memory bank, the performance of
classification boundary with insufficient labeled source samples.
model suffers from obvious reduction. Thus, both modules lead to
Retrieved images. From the experiment on task R→A of Office- the improvements of DHLing.
Home dataset, we select the top-10 retrieved images with the given
query instances. As Figure 4 shown, our DHLing retrieves more 5 CONCLUSIONS
correct instances than PWCF, e.g., 7 (ours) v.s. 5 with the query ’Fan’
Conventional domain adaptive retrieval assumes that the model
image. Moreover, our method also searches the incorrect instances,
is trained with well-labeled source domain and target images and
but they are relatively reasonable over that of PWCF. For instances,
retrieves source images given the query target image. Differently, in
given the query ’Pencil’ image, the incorrect samples retrieved
this paper we introduce the challenging yet more practical setting
by DHLing are pen, while PWCF matches it with other unrelated
–SDAR– assuming the source domain includes a small portion of
flower and flipflops.
annotated source images and abundant unlabeled ones. We propose
Precision&Recall. To delicately reflect the retrieval ability of mod- a novel Discriminative Hashing Learning (DHLing) model by simul-
els, we attempt to explore the recall and precision with the in- taneously exploring structural knowledge to assign annotation for
creasing number of retrieved instances. The specific experiment unlabeled samples for more discriminative hash code, and seeking
is performed on task P→R with 64 bits under DAR setting. Fig- the domain-invariant memory bank to supervise the effective cross-
ure 5 demonstrates our method accurately selects instances from domain feature learning. The comprehensive experimental results
the retrieved database semantically similar with the queries when on several benchmarks demonstrate the superiority of our approach
compared with other baselines. over state-of-the-art retrieval methods under both conventional
Ablation Study. To clearly understand the influence of each mod- and semi-supervised cross-domain retrieval settings.
ule on improving the model performance, we design two model
REFERENCES on pattern analysis and machine intelligence 41, 4 (2018), 941–955.
[1] Cong Bai, Ling Huang, Xiang Pan, Jianwei Zheng, and Shengyong Chen. 2018. [28] Ji Liu and Lei Zhang. 2019. Optimal projection guided transfer hashing for image
Optimization of deep convolutional neural network for large scale image retrieval. retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33.
Neurocomputing 303 (2018), 60–67. 8754–8761.
[2] Christian Buchta, Martin Kober, Ingo Feinerer, and Kurt Hornik. 2012. Spherical [29] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012. Su-
k-means clustering. Journal of statistical software 50, 10 (2012), 1–22. pervised hashing with kernels. In 2012 IEEE Conference on Computer Vision and
[3] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality- Pattern Recognition. IEEE, 2074–2081.
sensitive hashing scheme based on p-stable distributions. In Proceedings of the [30] Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S Yu.
twentieth annual symposium on Computational geometry. 253–262. 2013. Transfer feature learning with joint distribution adaptation. In Proceedings
[4] Hal Daumé III. 2009. Frustratingly easy domain adaptation. ACL (2009). of the IEEE international conference on computer vision. 2200–2207.
[5] Kun Ding, Bin Fan, Chunlei Huo, Shiming Xiang, and Chunhong Pan. 2016. [31] Xiaoqiang Lu, Yaxiong Chen, and Xuelong Li. 2017. Hierarchical recurrent
Cross-modal hashing via rank-order preserving. IEEE Transactions on Multimedia neural hashing for image retrieval with hierarchical convolutional features. IEEE
19, 3 (2016), 571–585. transactions on image processing 27, 1 (2017), 106–120.
[6] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, [32] Mathias Lux. 2011. Content based image retrieval with lire. In Proceedings of the
and Trevor Darrell. 2014. Decaf: A deep convolutional activation feature for 19th ACM international conference on Multimedia. 735–738.
generic visual recognition. In International conference on machine learning. PMLR, [33] Yueming Lv, Wing WY Ng, Ziqian Zeng, Daniel S Yeung, and Patrick PK Chan.
647–655. 2015. Asymmetric cyclical hashing for large scale image retrieval. IEEE Transac-
[7] Yan Feng, Bin Chen, Tao Dai, and Shu-Tao Xia. 2020. Adversarial attack on deep tions on Multimedia 17, 8 (2015), 1225–1235.
product quantization network for image retrieval. In Proceedings of the AAAI [34] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. 2009. Online dic-
Conference on Artificial Intelligence, Vol. 34. 10786–10793. tionary learning for sparse coding. In Proceedings of the 26th annual international
[8] Bojana Gajic and Ramon Baldrich. 2018. Cross-domain fashion image retrieval. conference on machine learning. 689–696.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [35] Adnan Qayyum, Syed Muhammad Anwar, Muhammad Awais, and Muhammad
Workshops. 1869–1871. Majid. 2017. Medical image retrieval using deep convolutional neural network.
[9] Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2012. Neurocomputing 266 (2017), 8–20.
Iterative quantization: A procrustean approach to learning binary codes for [36] Jerome Revaud, Jon Almazán, Rafael S Rezende, and Cesar Roberto de Souza.
large-scale image retrieval. IEEE transactions on pattern analysis and machine 2019. Learning with average precision: Training image retrieval with a listwise
intelligence 35, 12 (2012), 2916–2929. loss. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
[10] Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end 5107–5116.
learning of deep visual representations for image retrieval. International Journal [37] Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised
of Computer Vision 124, 2 (2017), 237–254. discrete hashing. In Proceedings of the IEEE conference on computer vision and
[11] Mehrdad Hosseinzadeh and Yang Wang. 2020. Composed query image retrieval pattern recognition. 37–45.
using locally bounded features. In Proceedings of the IEEE/CVF Conference on [38] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks
Computer Vision and Pattern Recognition. 3596–3605. for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[12] Fuxiang Huang, Lei Zhang, Yang Yang, and Xichuan Zhou. 2020. Probability [39] Kai Song, Yonghong Tian, Wen Gao, and Tiejun Huang. 2006. Diversifying the
weighted compact feature for domain adaptive retrieval. In Proceedings of the image retrieval results. In Proceedings of the 14th ACM international conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9582–9591. on Multimedia. 707–710.
[13] Junshi Huang, Rogerio S Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross- [40] Jun Tang, Ke Wang, and Ling Shao. 2016. Supervised matrix factorization hashing
domain image retrieval with a dual attribute-aware ranking network. In Proceed- for cross-modal retrieval. IEEE Transactions on Image Processing 25, 7 (2016),
ings of the IEEE international conference on computer vision. 1062–1070. 3157–3166.
[14] Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE [41] Antonio Torralba and Alexei A Efros. 2011. Unbiased look at dataset bias. In
Transactions on pattern analysis and machine intelligence 16, 5 (1994), 550–554. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Xin Ji, Wei Wang, Meihui Zhang, and Yang Yang. 2017. Cross-domain image 1521–1528.
retrieval with attention modeling. In Proceedings of the 25th ACM international [42] Yair Weiss, Antonio Torralba, Robert Fergus, et al. 2008. Spectral hashing.. In
conference on Multimedia. 1654–1662. Advances in neural information processing systems, Vol. 1. 4.
[16] Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable graph hashing with feature [43] Dayan Wu, Zheng Lin, Bo Li, Mingzhen Ye, and Weiping Wang. 2017. Deep
transformation.. In IJCAI, Vol. 15. 2248–2254. supervised hashing for multi-label and large-scale image retrieval. In Proceedings
[17] Shuhui Jiang, Yue Wu, and Yun Fu. 2016. Deep bi-directional cross-triplet em- of the 2017 ACM on International Conference on Multimedia Retrieval. 150–158.
bedding for cross-domain clothing retrieval. In Proceedings of the 24th ACM [44] Haifeng Xia and Zhengming Ding. 2020. Hgnet: Hybrid generative network
international conference on Multimedia. 52–56. for zero-shot domain adaptation. In Computer Vision–ECCV 2020: 16th European
[18] Zhongming Jin, Cheng Li, Yue Lin, and Deng Cai. 2013. Density sensitive hashing. Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer,
IEEE transactions on cybernetics 44, 8 (2013), 1362–1371. 55–70.
[19] Taotao Jing, Haifeng Xia, and Zhengming Ding. 2020. Adaptively-accumulated [45] Haifeng Xia and Zhengming Ding. 2020. Structure preserving generative cross-
knowledge transfer for partial domain adaptation. In Proceedings of the 28th ACM domain learning. In Proceedings of the IEEE/CVF Conference on Computer Vision
International Conference on Multimedia. 1606–1614. and Pattern Recognition. 4364–4373.
[20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi- [46] Liang Xie, Jialie Shen, and Lei Zhu. 2016. Online cross-modal hashing for web
cation with deep convolutional neural networks. Advances in neural information image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence,
processing systems 25 (2012), 1097–1105. Vol. 30.
[21] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- [47] Chenggang Yan, Biao Gong, Yuxuan Wei, and Yue Gao. 2020. Deep multi-view
based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278– enhancement hashing for image retrieval. IEEE Transactions on Pattern Analysis
2324. and Machine Intelligence (2020).
[22] Xiaodan Liang, Liang Lin, Wei Yang, Ping Luo, Junshi Huang, and Shuicheng [48] Guanglei Yang, Haifeng Xia, Mingli Ding, and Zhengming Ding. 2020. Bi-
Yan. 2016. Clothes co-parsing via joint image segmentation and labeling with directional generation for unsupervised domain adaptation. In Proceedings of the
application to clothing retrieval. IEEE Transactions on Multimedia 18, 6 (2016), AAAI Conference on Artificial Intelligence, Vol. 34. 6615–6622.
1175–1186. [49] Huei-Fang Yang, Kevin Lin, and Chu-Song Chen. 2017. Supervised learning
[23] Jie Lin, Zechao Li, and Jinhui Tang. 2017. Discriminative Deep Hashing for of semantics-preserving hash via deep convolutional neural networks. IEEE
Scalable Face Image Retrieval.. In IJCAI. 2266–2272. transactions on pattern analysis and machine intelligence 40, 2 (2017), 437–451.
[24] Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao, and Chu-Song Chen. 2015. Deep [50] Meng Yang, Lei Zhang, Xiangchu Feng, and David Zhang. 2011. Fisher dis-
learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE crimination dictionary learning for sparse representation. In 2011 International
conference on computer vision and pattern recognition workshops. 27–35. Conference on Computer Vision. IEEE, 543–550.
[25] Yen-Yu Lin, Tyng-Luh Liu, and Hwann-Tzong Chen. 2005. Semantic manifold [51] Tao Yao, Gang Wang, Lianshan Yan, Xiangwei Kong, Qingtang Su, Caiming
learning for image retrieval. In Proceedings of the 13th annual ACM international Zhang, and Qi Tian. 2019. Online latent semantic hashing for cross-media
conference on Multimedia. 249–258. retrieval. Pattern Recognition 89 (2019), 1–11.
[26] Chao Liu, Jingjing Ma, Xu Tang, Fang Liu, Xiangrong Zhang, and Licheng Jiao. [52] Xiangtao Zheng, Yichao Zhang, and Xiaoqiang Lu. 2020. Deep balanced discrete
2020. Deep hash learning for remote sensing image retrieval. IEEE Transactions hashing for image retrieval. Neurocomputing 403 (2020), 224–236.
on Geoscience and Remote Sensing (2020). [53] Joey Tianyi Zhou, Heng Zhao, Xi Peng, Meng Fang, Zheng Qin, and Rick
[27] Hong Liu, Rongrong Ji, Jingdong Wang, and Chunhua Shen. 2018. Ordinal con- Siow Mong Goh. 2018. Transfer hashing: From shallow to deep. IEEE transactions
straint binary coding for approximate nearest neighbor search. IEEE transactions on neural networks and learning systems 29, 12 (2018), 6191–6201.

You might also like