You are on page 1of 1

GNR650: Paper Review 2

Harsh Chaurasia
October 2023

1 Introduction
TITLE: Universal Domain Adaptation Through Self-Supervision
AUTHORS: K. Saito, D. Kim, S. Sclaroff
Domain Adaptation (DA) is one of the most challenging and interesting problems in Computer Vision right now. Most of the current
unsupervised approaches assume that all the source categories are present in the target domain, which often proves to be impractical.
The authors in this paper have proposed Dance: Domain Adaptive Neighborhood Clustering via Entropy optimisation, a
novel method to enforce soft feature alignment paired with clustering in order to achieve universal domain adaptation.
Recent DA techniques make strong assumptions about category overlap between domains. For Ls and Lt being label sets from the
source and the target domain, the authors have tried to take care of all possible category shifts: OPEN-SET DA (Ls ⊂ Lt ),
CLOSED-SET DA (Ls = Lt ), PARTIAL DA (Lt ⊂ Ls ), and a mix of open-set and partial. Since the target domain is unlabelled in
a practical setting, we may not know which one of the above settings is at hand, leading to catastrophic misalignments. Moreover,
we often rely too much on source supervision, making it difficult to obtain discriminative features on the target. This leads to
failure in discriminating ”unknown” categories from the known categories. The earlier proposed self-supervision methods do not use
the cluster structure of the target domain, which has been taken care of in this paper. DANCE is proposed in this paper, which
harnesses the cluster structure of the target domain as well by using self-supervision. This preserves useful source features by using
distribution alignment by batch normalisation and entropy separation loss. DANCE is shown to achieve state-of-the-art results on
every source-only model.
Classical approaches for CDA measure distance between the source and target feature distributions and train a model to minimise
this distance. However, this method fails on ODA, PDA, and OPDA, defeating the purpose of UniDA. Additionally, self-supervised
methods have been used in the past, but directly using them is challenging as we require the number of clusters in the target domain,
which is not always readily available.

2 DANCE: Domain Adaptive Neighborhood Clustering via Entropy Minimisa-


tion
DANCE aims to label the target samples with either one of the labels in Ls or the ”unknown” label. For this, the model is trained
on Ds ∪ Dt and is evaluated on Dt . The authors have taken care not to force complete alignment between the entire source and
target distributions. They have carefully performed a relaxed alignment to extract well-clustered target features. The proposed
”neighborhood clustering” uses self-supervision to cluster target samples. Each target point is aligned to a known class prototype
in the source or to its neighbor in the target. This learns a discriminative metric mapping a point to a semantically close match.
Another proposed component in this paper is entropy separation loss, which helps us to align the target point with a source
prototype or reject it as ”unknown”. The network uses an L2 norm layer before the last linear layer.
THe model is to be made to learn well-clustered features to help us extract discriminative features. The number of clusters in
the target domain is NOT NEEDED under neighborhood clustering. Here, similarity is calculated for all target samples and proto-
types for each mini-batch. The memory is updated so that it stores features considering the momentum of features in previous epochs.
NX
t +K
1 X
Lnc = − pi,j log pi,j
|Bt |
iinBt j=1,j̸=i

This is minimised to align each target sample to either a target neighbor or a prototype.
Some of the target samples still need to be aligned with known source categories. Unknown target samples are likely to have a
larger entropy of the source classifier’s output than known target samples as they do not share common features. For this, entropy
separation loss is used:
1 X
Les = Les (ρi )
|Bt |
i∈Bt

The final objective is obtained as: L = Lcls + λ(Lnc + Les )


DANCE has been compared with baselines using the datasets Office, OfficeHome, having 3 and 2 domains respectively. ResNet50
pre-trained on ImageNet was used as feature extractor. For source-only DA, DANCE is the only method which improves performance
and achieves accuracy comparable to ETN. It also gives best performance in ODA (aligned using STA) and OPDA. In VisDA, DANCE
outperformes baselines. Extensive ablation studies were also performed to gauge the effect of each loss component and alignment
approach.
In the end, I believe that DANCE, if paired with meta-learning, can prove to be highly beneficial for few-shot learning under domain
shift.

You might also like