1 s2.0 S0967066123000072 Main

Control Engineering Practice 133 (2023) 105438
Contents lists available at ScienceDirect
Control Engineering Practice

journal homepage: www.elsevier.com/locate/conengprac
Towards generalization on real domain for single image dehazing via

meta-learning
Wenqi Ren a , Qiyu Sun a,b , Chaoqiang Zhao a , Yang Tang a ,∗
a Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and
Technology, Shanghai 200237, China
b
Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, China
ARTICLE INFO ABSTRACT

Keywords: Learning-based image dehazing methods are essential to assist autonomous systems in enhancing reliability.
Image dehazing Due to the domain gap between synthetic and real domains, the internal information learned from synthesized
Domain generalization images is usually sub-optimal for real domains. Thus, the dehazing models trained on synthetic images
Meta-learning
often suffer from severe performance drops on real-world samples. Driven by the ability to explore internal
information from a few unseen-domain samples, meta-learning is commonly adopted to address this issue
via test-time training, which is hyperparameter-sensitive and time-consuming. In contrast, we present a
domain generalization framework based on meta-learning to dig out representative and discriminative
internal properties of real hazy domains without test-time training. To obtain representative domain-specific
information, we attach two entities termed adaptation network and distance-aware aggregator to our dehazing
network. The adaptation network assists in distilling domain-relevant information from a few hazy samples
and caching it into a collection of features. The distance-aware aggregator strives to summarize the generated
features and filter out misleading information for more representative internal properties. To enhance the
discrimination of distilled internal information, we present a novel loss function called domain-relevant
contrastive regularization. It encourages the similarity of the internal features generated from the same domain
and the distinction of the features from diverse domains. The generated representative and discriminative
features are regarded as some external variables of our dehazing network to regress a particular function for a
given domain. The extensive experiments on real hazy datasets validate that our proposed method has superior
generalization ability than the state-of-the-art competitors.
1. Introduction are expensive to be collected in practice. To address this issue, numer-

ous synthetic datasets (Li et al., 2019b, 2020; Sakaridis et al., 2018;
Machine learning-based visual perception is critical to the environ- Zheng et al., 2021) are spawned, where hazy images are synthesized
ment understanding, control, and decision-making tasks of autonomous from clear counterparts. However, these synthesized hazy images are
systems (Gróf et al., 2022; Ifqir et al., 2022; Tang et al., 2021), in biased to depict real scenarios, leading to the domain gap between
which clear images play a significant role (Wu et al., 2022; Zhao et al., synthetic and real samples. Thus, the dehazing models learned from
2022a, 2022b). However, when affected by haze, images captured these synthetic samples (Li et al., 2019b, 2020; Sakaridis et al., 2018;
outdoors usually suffer from low contrast and poor visibility, resulting Zheng et al., 2021) frequently suffer from severe performance drop on
in the severe performance degradation of perception models, as shown real images, as the internal information learned from synthetic domains
in Fig. 1. Thus, it is significant to design an algorithm to recover is usually sub-optimal for real domains. Although domain adaptation-
clear images from hazy counterparts so that autonomous systems, like based approaches (Chen et al., 2021; Li et al., 2019a; Shao et al., 2020)
self-driving systems, can operate reliably in bad weather. can obtain performance gains on real hazy images, a great deal of real
Recently, a great number of learning-based image dehazing methods hazy data is additionally required for the implementations of these
are proposed, which provide a solution to promote the robustness of methods.
autonomous systems in foggy scenes (Tang et al., 2022; Xu et al., 2019; Meta-learning is capable of digging out internal information from a
Zhang et al., 2020b). Nevertheless, the optimization of these dehazing few samples of target domains, so that the model only trained on source
models requires a large quantity of paired hazy/clear images, which domains can quickly adapt to target domains (Sun et al., 2022; Zhang
∗ Corresponding author.
E-mail addresses: wenqiren9801@163.com (W. Ren), yangtang@ecust.edu.cn (Y. Tang).
https://doi.org/10.1016/j.conengprac.2023.105438
Received 23 October 2022; Received in revised form 26 December 2022; Accepted 13 January 2023
Available online 24 January 2023
0967-0661/© 2023 Elsevier Ltd. All rights reserved.
W. Ren, Q. Sun, C. Zhao et al. Control Engineering Practice 133 (2023) 105438
to suppress the misleading information of outliers. The proposed ag-

gregator reweights the internal features of diverse samples according
to their relative distances. In this way, the misleading information
from outliers is weakened and the representative information of normal
samples is enhanced. To address the third dilemma, we attempt to make
the internal information distilled from the same domain more similar
and that from diverse domains more distinct. In particular, intra-
and inter-domain information is incorporated, and a domain-relevant
contrastive regularization is presented to grasp the intra-domain ho-
mogeneity and inter-domain heterogeneity, which further enhances the
representativeness and discrimination of external variables. Compre-
hensive experiments are conducted to demonstrate the generalization
ability of our dehazing model on real domains and its effectiveness in
Fig. 1. An example of potential application scenarios of our dehazing model in
self-driving systems. boosting the reliability of vision-based autonomous systems.
Source: Adapted from Lee et al. (2022). In summary, the main contributions are listed as follows:
• A meta-learning-based domain generalization framework is pro-

posed for single image dehazing, which can enable internal learn-
et al., 2020b). Therefore, applying meta-learning to single image de-
ing on real domains without test-time training.
hazing will be a promising subject to improve the generalization ability
• A distance-aware aggregator is presented to capture representa-
on real domains. Meta-learning can be categorized into metric-based,
tive internal information of specific domains and suppress mis-
optimization-based, and model-based approaches (Chen et al., 2022;
leading ones.
Huisman et al., 2021), where optimization-based techniques (Finn
• A domain-relevant contrastive regularization is presented, which
et al., 2017; Liu et al., 2019a; Sun et al., 2020) are usually employed
facilitates the regressed dehazing function to capture discrimina-
in image restoration (Chi et al., 2021; Soh et al., 2020) to explore the
tive internal information of specific domains.
internal information of new scenarios. Nonetheless, these methods (Chi
• The experiments on real hazy images demonstrate that our pro-
et al., 2021; Soh et al., 2020) depend heavily on the test-time-training
posed framework is superior to the state-of-the-art learning-based
strategy. Therefore, they require carefully designed hyperparameters
dehazing methods (Guo et al., 2022; Yang et al., 2022).
(e.g., iteration steps) (Gao et al., 2022) and additional computational
consumptions (Huisman et al., 2021) at test time, but this operation The rest of this paper is organized as follows. Section 2 reviews some
hinders the real-time applications of autonomous systems (van Dooren leading-edge studies related to our work. Section 3 gives a detailed
et al., 2022; Kaleli, 2020; Zhang et al., 2022). Different from these overview of our dehazing framework. Section 4 illustrates the imple-
researches (Chi et al., 2021; Soh et al., 2020), we seek to deal with mentation details and experimental results. Section 5 summarizes this
domain generalization via model-based meta-learning methods (Chen paper and discusses future work.
et al., 2022; Garnelo et al., 2018a; Huisman et al., 2021; Ye & Yao,
2022; Zhang et al., 2021). Specifically, adaptive risk minimization 2. Related work
(ARM) (Zhang et al., 2021) is employed to avoid test-time training and
enable internal learning (Chi et al., 2021; Zhang et al., 2019) on real 2.1. Single image dehazing
domains.
In particular, we add two entities termed adaptation network and 2.1.1. Atmospheric scattering model
aggregator to the dehazing network. The adaptation network assists To describe the formation of a hazy image, the atmospheric scat-
in distilling internal information from a few samples and caching it tering model (McCartney, 1976; Narasimhan & Nayar, 2000, 2002) has
into a collection of features. The aggregator strives to summarize the been extensively used, which can be defined as
generated features to grasp domain-specific internal properties. The
summarized features are regarded as some external variables of the 𝑋 = 𝐽 𝑡 + 𝐴(1 − 𝑡), (1)
dehazing network to regress a particular function for the given domain. where 𝑋 and 𝐽 denote the hazy image and the haze-free image,
Intuitively, the more representative and discriminative the extracted respectively. 𝐴 is the global atmospheric light, and 𝑡 is the transmission
domain-specific information is, the more capable the regressed dehaz- map. 𝑡 can be modeled as 𝑡 = 𝑒−𝛽𝑑 , where 𝛽 and 𝑑 are the scattering
ing function is to cope with samples in the given domain. However, coefficient of atmospheric and the distance between objects and the
directly employing the settings of ARM (Zhang et al., 2021) into camera, respectively.
image dehazing may cause some limitations. Firstly, the adaptation
network is made of vanilla convolution layers, which may lead to 2.1.2. Learning-based approaches
unreliable internal information, as the inputs are generally covered by The past decade has witnessed the emergence of a large number
haze in image dehazing. Secondly, the aggregator treats each sample of learning-based image dehazing algorithms. Li et al. (2017) predict
equally and makes the external variables vulnerable to outliers, since transmission maps and atmospheric lights jointly in a unified convo-
the features encoded from outliers usually fail to grasp representa- lutional neural network (CNN) and then generate haze-free images.
tive domain-specific information. Thirdly, the adaptation network and However, estimating intermediate variables (e.g., transmission maps)
the aggregator only focus on intra-domain information with inter- may give rise to cumulative errors (Cai et al., 2016; Lee et al., 2020;
domain information unused, leading to discriminative domain-specific Ren et al., 2016; Zhang & Patel, 2018). Therefore, Liu et al. (2019b)
information failing to be captured sufficiently. abandon the estimate of transmission maps and design an end-to-
Targeting the first issue, we embed context-gate convolution (CG- end CNN network to conduct dehazing. To balance the performance
Conv) layers (Lin et al., 2020) into the adaptation network, which and computational costs, Wu et al. (2021) devise a compact network
enhances the reliability of features by fusing context information from architecture and introduce contrastive learning to suppress unexpected
entire images. Aiming at the second challenge, we notice that the predictions. Driven by the ability of Transformer in modeling long-
features of outliers are usually located far away from that of normal range feature dependencies, Guo et al. (2022) integrate Transformer
samples. Thus, we propose a non-parameter distance-aware aggregator and CNN for single image dehazing.
2
Fig. 2. The overview of the proposed framework.
Learning-based approaches require a great deal of paired data to Table 1

optimize their models, which spawns various synthetic datasets (Li The nomenclature of the symbolism involved in this paper.
et al., 2019b, 2020; Sakaridis et al., 2018; Zheng et al., 2021). Nev- Symbolism Meaning
ertheless, since synthetic hazy images fail to depict real-world hazy 𝑆hazy Domain set of hazy samples
scenes reliably, there is a domain gap between synthetic and real-world 𝑆𝑖 The 𝑖th domain
𝐷𝑖 The 𝑖th task
hazy images. Consequently, the model trained on synthetic datasets (Li
𝑥 Hazy images
et al., 2019b, 2020; Sakaridis et al., 2018; Zheng et al., 2021) frequently 𝑦 Haze-free images
suffers from a performance drop on the real hazy images, because 𝜙𝑖 Task-specific parameter of 𝐷𝑖
the features learned from synthetic domains are sub-optimal for real 𝜙𝑖∗ Optimal domain-specific parameter of 𝑆 𝑖
domains. Although both ours and existing work (Guo et al., 2022; Lee 𝜑 Preliminary parameter
𝐸 Adaptation network
et al., 2020; Wu et al., 2021; Zhang & Patel, 2018) rely on synthetic
𝐹 Dehazing network
data, the motivations are different. The existing methods (Guo et al., 𝜔1 Neural weights of adaptation network
2022; Lee et al., 2020; Wu et al., 2021; Zhang & Patel, 2018) aim at 𝜔2 Neural weights of dehazing network
exploring the internal information of synthetic domains. Instead, our 𝑀 Number of sample pairs in each task
meta-learning-based model attempts to learn how to extract internal 𝑁 Number of sampled tasks
𝐼 Number of domains in 𝑆hazy
information from diverse synthetic domains. Thus, when applied to real
𝐾 Number of preliminary parameters of a task
images, our model can quickly distill domain-specific information of
real domains to improve the performance on real samples.
2.1.3. Domain adaptation-based approaches (2022) conduct test-time training to enable the adaptation to specific
To narrow the domain gap, domain adaptation-based approaches domains.
attempt to adopt both synthetic and real hazy images (Chen et al., However, test-time training depends heavily on manually designed
2021; Li et al., 2019a; Shao et al., 2020). These approaches employ hyperparameters (e.g., iteration steps and learning rates), which leads
real hazy images for training (Li et al., 2019a; Shao et al., 2020) or to under-fitting on target unseen images (Gao et al., 2022). In addition,
fine-tuning (Chen et al., 2021) to capture the internal information of test-time training increases the computational costs and runtime of
real domains. Although all these approaches (Chen et al., 2021; Li et al., the model (Huisman et al., 2021; Liu et al., 2022), which results in
2019a; Shao et al., 2020) have achieved performance gains on real hazy low efficiency in practical applications. To tackle these issues, this
images, they have an additional requirement on a large number of real paper resorts to model-based meta-learning approaches (Garnelo et al.,
hazy images. In contrast, our meta-learning-based domain generaliza- 2018a, 2018b; Zhang et al., 2021), especially ARM (Zhang et al., 2021),
tion framework can enable internal learning and address the domain
to modify the model without sensitive and time-consuming test-time
shift by merely exploiting synthetic samples, which is significant for
training. Compared with ARM (Zhang et al., 2021), we further present
the practical application of the model.
a distance-aware aggregator and a domain-relevant contrastive regu-
larization, which encourages the model to extract more representative
2.2. Meta-learning in image restoration
and discriminative internal information of the given domain.
Meta-learning, also known as learning to learn, targets adapting
to a new scenario rapidly from a limited number of samples. It has 3. Methodology
been applied to diverse computer vision tasks and has made significant
breakthroughs in recent years. Meta-learning can be categorized into 3.1. Framework overview
metric-based, model-based, and optimization-based techniques (Huis-
man et al., 2021). Among them, optimization-based algorithms, espe- In this paper, we present a novel framework for single image de-
cially model-agnostic meta-learning (MAML) (Finn et al., 2017) and hazing, which can deal with out-of-distribution domain generalization
its variants (Liu et al., 2019a; Sun et al., 2020), are widely employed and enable internal learning on real domains without test-time training.
in image restoration. Soh et al. (2020) employ MAML to image super- As shown in Fig. 2, our proposed framework includes an adaptation
resolution for obtaining an optimal model initialization, based on which network 𝐸𝜔1 (⋅) and a dehazing network 𝐹𝜔2 (⋅, 𝜙). 𝜔1 and 𝜔2 stand for
the model can adapt to unseen samples with several test-time training the neural parameters of 𝐸𝜔1 (⋅) and 𝐹𝜔2 (⋅, 𝜙), respectively. 𝜙 denotes
steps. Chi et al. (2021) adopt an auxiliary reconstruction task to opti- external variables and serves as an additional input of 𝐹𝜔2 (⋅, 𝜙). Among
mize the model indirectly to deal with blur images caused by unseen them, 𝜔1 and 𝜔2 are fixed after meta-training, which are shared cross-
kernels. To tackle multi-domain learning in image dehazing, Liu et al. domains. 𝜙 varies with domain properties of input hazy images and
3
Fig. 3. The process of obtaining task-specific parameters.
Fig. 5. Comparison between the average operation and our distance-aware aggregation.
The blue ellipse represents the distribution of {𝜑𝑖𝑘 }𝑀
𝑘=1
estimated by 𝑀 samples in 𝑆 𝑖 .
The architecture of 𝐸𝜔1 (⋅) is displayed in Fig. 3. Since the internal

information of input images is generally covered by haze, directly
employing vanilla convolution layers, which adopt a local perspective
to extract domain information, may lead to unreliable {𝜑𝑖𝑘 }𝐾 𝑘=1
with
poor internal information. In this work, CG-Conv (Lin et al., 2020) is
introduced to compose the principal part of 𝐸𝜔1 (⋅). By adopting CG-
Conv layers, the extracted internal information is capable of integrating
context information from entire images, to access more robust {𝜑𝑖𝑘 }𝐾 𝑘=1
with richer domain properties. In particular, 𝐸𝜔1 (⋅) consists of two CG-
Conv layers. Each layer is followed by a batch normalization (BN) and a
ReLU activation function, where the BN layer is employed to accelerate
Fig. 4. Uniform manifold approximation and projection (UMAP) feature visualization the convergence of the network. Moreover, a conventional convolution
of 12000 images randomly sampled from Outdoor Training Set (OTS), Indoor Training layer is connected to output the preliminary parameter 𝜑𝑖𝑘 for each
Set (ITS), and Real Task-driven Testing Set (RTTS) datasets (Li et al., 2019b), which are
input sample 𝑥𝑖𝑘 .
denoted as the label 0, label 1, and label 2, respectively. The features are generated
by a pre-trained ResNet101 (He et al., 2016). Some outliers of the OTS dataset are
highlighted by the green dotted circles. (For interpretation of the references to colour 3.3. Distance-aware aggregation
in this figure legend, the reader is referred to the web version of this article.)
A commonly-used method to aggregate {𝜑𝑖𝑘 }𝐾 𝑘=1

for 𝜙𝑖 is to treat the
internal properties extracted from each sample equally and employ an
is specified to a particular domain. Assume that there are 𝐼 domains
average operation to obtain 𝜙𝑖 (Garnelo et al., 2018a, 2018b; Ye & Yao,
𝑆hazy = {𝑆 𝑖 }𝐼𝑖=1 with 𝑀 hazy and haze-free pairs {(𝑥𝑘 , 𝑦𝑘 )}𝑀 𝑘=1
, and the
2022; Zhang et al., 2021):
optimal external variables of 𝐼 domains are represented as domain-
specific parameters {𝜙𝑖∗ }𝐼𝑖=1 . Intuitively, {𝜙𝑖∗ }𝐼𝑖=1 embodies the most 1 ∑ 𝑖
𝐾
representative and discriminative internal information of correspond- 𝜙𝑖 = 𝜑 . (2)

𝐾 𝑘=1 𝑘
ing domains. Given a specific domain 𝑆 𝑖 ∈ 𝑆hazy and a task 𝐷𝑖 =
{(𝑥𝑖𝑘 , 𝑦𝑖𝑘 )}𝐾
𝑘=1
∈ 𝑆 𝑖 , our aim is to estimate a task-specific parameter 𝜙𝑖 However, apart from normal samples, there are some outliers in hazy
from 𝐷 to approximate 𝜙𝑖∗ , so that the regressed dehazing function
𝑖 domains. The features generated by outliers are usually located far
𝐹𝜔2 (⋅, 𝜙𝑖 ) is capable of handling 𝑀 samples in 𝑆 𝑖 commendably. We away from those of normal samples, as shown in the green dotted
adopt 𝐸𝜔1 (⋅) to dig out internal information related to 𝑆 𝑖 from each circles in Fig. 4 , and fail to capture representative domain-specific
sample in 𝐷𝑖 . The distilled information is cached into a series of internal properties. When encountering outliers, the average aggregator
features {𝜑𝑖𝑘 }𝐾 𝑘=1
, which we call preliminary parameters in this paper. may lead 𝜙𝑖 to deviate from 𝜙𝑖∗ , as depicted in Fig. 5(a) and (b). Thus,
These preliminary parameters are then aggregated to obtain 𝜙𝑖 via a the aggregator must suppress misleading information of outliers.
permutation invariant operation (Garnelo et al., 2018a, 2018b; Ye & Considering the feature distribution of outliers and normal samples,
Yao, 2022; Zhang et al., 2021). In this way, given {𝐷𝑖 }𝑁 𝑖=1
randomly we propose a non-parametric distance-aware aggregation operation to
sampled from various domains in 𝑆hazy , 𝑁 distinct and powerful dehaz- alleviate the adverse effects caused by outliers. Specifically, we aim
ing functions {𝐹𝜔2 (⋅, 𝜙𝑖 )}𝑁 𝑖=1
can be obtained without test-time training. to reduce the contributions of outliers on 𝜙𝑖 and heighten that of
The nomenclature of the symbolism is listed in Table 1. normal samples. We first calculate the average distance 𝑑𝑘𝑖 between 𝑘th
preliminary parameter 𝜑𝑖𝑘 and remaining ones of 𝐷𝑖 :
3.2. Adaptation network
1 ∑
𝐾
As exhibited in Fig. 2, 𝜙𝑖 of 𝐷𝑖 is obtained through our adaptation 𝑑𝑘𝑖 = ‖𝜑𝑖 − 𝜑𝑖𝑠 ‖1 , (3)
𝐾 − 1 𝑠=1,𝑠≠𝑘 𝑘
network 𝐸𝜔1 (⋅) as well as our aggregator. The 𝐸𝜔1 (⋅) explores the
domain properties from the samples in 𝐷𝑖 and store them into a series of where ‖ ⋅ ‖1 denotes the L1 regularization. Thus, we can obtain a set of
preliminary parameters {𝜑𝑖𝑘 }𝐾
𝑘=1
. The aggregator summarizes the inter- distance values {𝑑𝑘𝑖 }𝐾
𝑘=1
for 𝐾 samples in 𝐷𝑖 . The larger the value of 𝑑𝑘𝑖 ,
nal information hidden in {𝜑𝑖𝑘 }𝐾
𝑘=1
to obtain 𝜙𝑖 . In this section, we focus the higher the probability that 𝑥𝑖𝑘 can be regarded as an outlier, and
on our designed 𝐸𝜔1 (⋅) and discuss our aggregator in the next section. the weight coefficient of 𝜑𝑖𝑘 needs to be reduced. Then, we reset the
4
weights of {𝜑𝑖𝑘 }𝐾
𝑘
according to the computed {𝑑𝑘𝑖 }𝐾
𝑘=1
, and obtain 𝜙𝑖 by
summing the reweighted preliminary parameters:
∑
𝐾
𝑒𝑥𝑝(−𝑑𝑘𝑖 )
𝜙𝑖 = ∑𝐾 𝜑𝑖 . (4)
𝑒𝑥𝑝(−𝑑 𝑖) 𝑘
𝑘=1 𝑠=1 𝑠
In this way, 𝜙𝑖 can capture more representative internal information

from normal samples and less misleading information from outliers,
which encourages 𝜙𝑖 to be located closer to 𝜙𝑖∗ , as illustrated in
Fig. 5(c).
Fig. 6. An example to illustrate the domain-relevant contrastive regularization, where

3.4. Domain-relevant contrastive regularization 𝜙𝑖 and 𝜙𝑝 are from 𝑆 𝑖 and 𝜙𝑗 are from 𝑆 𝑗 .
To regress a more discriminative and representative task-specific

parameter, in this paper, we incorporate inter-domain information with
including the pixel-wise loss 𝐿𝑝𝑖𝑥𝑒𝑙 (Dong et al., 2020a; Qin et al.,
intra-domain properties. We note that each task-specific parameter
2020), the structural similarity loss 𝐿𝑆𝑆𝐼𝑀 (Dong et al., 2020a), the
in {𝜙𝑖 }𝑁 is a linear combination of corresponding preliminary pa-
𝑖=1 contrastive regularization loss 𝐿𝐶𝑅 (Wu et al., 2021) and the cross
rameters and can be regarded as a collection of internal features. It
entropy loss 𝐿𝐶𝐸 . Therefore, the overall optimization function 𝐿 of a
is reasonable to assume that the task-specific parameters generated batch of 𝑛 sampled tasks is defined as
from the same domain have more similar internal features, while
the ones from different domains have more distinct features. Based 𝐿 =𝐿𝑝𝑖𝑥𝑒𝑙 + 𝜆1 𝐿𝑆𝑆𝐼𝑀 + 𝜆2 𝐿𝐶𝑅 + 𝜆3 𝐿𝐶𝐸 + 𝜆4 𝐿𝐷𝐶𝑅 , (7)
on this assumption, we introduce contrastive learning (Chen et al., where 𝜆1 , 𝜆2 , 𝜆3 and 𝜆4 are the trade-off weights.
2020; He et al., 2020) and propose a novel domain-relevant con-
trastive regularization to capture the intra-domain homogeneity and 3.5.1. Pixel-wise loss
inter-domain heterogeneity. Fig. 6 provides an example to depict our Pixel-wise loss is employed to quantify the pixel-wise distance be-
domain-relevant contrastive regularization. tween the generated image and the ground truth:
For the sake of measuring the feature similarities of task-specific
1 ∑𝑁 ∑ 𝐾
parameters generated from non-aligned tasks, we draw lessons from 𝐿𝑝𝑖𝑥𝑒𝑙 = ‖𝑦̂𝑖 − 𝑦𝑖𝑘 ‖1 , (8)
some studies in image-to-image translation (Zhan et al., 2021; Zhang 𝑁 × 𝐾 𝑖=1 𝑘=1 𝑘
et al., 2020c) and employ the contextual loss 𝐿𝑐𝑥 via: where the 𝑦̂𝑖𝑘 is the haze-free image generated from the 𝑘th sample 𝑥𝑖𝑘
∑ in the 𝑖th task.
𝐿𝑐𝑥 (𝜙𝑖 , 𝜙𝑗 ) = − 𝑙𝑜𝑔𝑒 ( 𝑚𝑎𝑥 𝐴𝑢𝑣 (𝜙𝑖,𝑢 , 𝜙𝑗,𝑣 )), (5)
𝑣
𝑢
3.5.2. Structural similarity loss
where 𝑢 and 𝑣 are the indexes of the feature maps in 𝜙𝑖 of 𝐷𝑖 and 𝜙𝑗 Structural similarity loss 𝐿𝑆𝑆𝐼𝑀 quantifies the distance between the
of 𝐷𝑗 , respectively. 𝐴𝑢𝑣 is the contextual similarity measurement com- restored image and the ground truth in terms of brightness and contrast.
monly adopted in image-to-image translation (Mechrez et al., 2018). It is defined as
The 𝑒 is the Euler number. The remaining question is how to choose
1 ∑𝑁 ∑ 𝐾
‘‘positive’’ and ‘‘negative’’ pairs. 𝐿𝑆𝑆𝐼𝑀 = (1 − 𝑆𝑆𝐼𝑀(𝑦̂𝑖𝑘 , 𝑦𝑖𝑘 )), (9)
Suppose that merely 𝜙𝑖 and 𝜙𝑝 in {𝜙𝑖 }𝑁 are from the same domain 𝑁 × 𝐾 𝑖=1 𝑘=1
𝑖=1
𝑆 𝑖 , while the others are from diverse domains except 𝑆 𝑖 . Our idea for where 𝑆𝑆𝐼𝑀(⋅, ⋅) stands for the operation to calculate the structural
the ‘‘positive’’ pair is to pick the more representative one from 𝜙𝑖 and 𝜙𝑝 similarity index (SSIM) of two images and is defined as follows:
and to adopt it to guide another to capture more representative internal (2𝜇1 𝜇2 + 𝐶1 )(2𝜎12 + 𝐶2 )
information. Targeting this issue, we embed an additional classifier 𝑆𝑆𝐼𝑀(𝑦̂𝑖𝑘 , 𝑦𝑖𝑘 ) = . (10)
(𝜇12 + 𝜇22 + 𝐶1 )(𝜎12 + 𝜎22 + 𝐶2 )
into our framework, as exhibited in Fig. 2. The classifier consists of
five convolution layers. Each layer is followed by a ReLU activation 𝜇1 and 𝜇2 are the mean values of 𝑦̂𝑖𝑘 and 𝑦𝑖𝑘 , respectively, and 𝜎1 and 𝜎2
function except the last one, and the last layer is followed by a pooling are the variances of 𝑦̂𝑖𝑘 and 𝑦𝑖𝑘 , separately. 𝐶1 and 𝐶2 are the constants
operation and a Sigmoid activation function. The classifier takes the to avoid situations where the denominator becomes zero.
{𝜙𝑖 }𝑁 𝑖=1
as inputs and attempts to predict the confidence scores. The
higher the confidence score, the more representative the parameter is 3.5.3. Contrastive regularization loss
and the easier it is to be classified into the corresponding domain. If We also employ the contrastive regularization (Wu et al., 2021) to
further meliorate the quality of the restored images in the representa-
the confidence score of 𝜙𝑝 is higher than that of 𝜙𝑖 , 𝜙𝑝 will be selected
tive space.
to assist 𝜙𝑖 to learn more representative internal information, and vice
versa. For the ‘‘negative’’ pairs, the unpicked parameter is grouped 1 ∑𝑁 ∑ 𝐾 ∑ 𝑇
‖𝑉𝑠 (𝑦𝑖𝑘 ), 𝑉 (𝑦̂𝑖𝑘 )‖1
𝐿𝐶𝑅 = 𝛼𝑠 , (11)
with each task-specific parameter generated from other domains to 𝑁 × 𝐾 𝑖=1 𝑘=1 𝑠=1 ‖𝑉𝑠 (𝑥𝑖 ), 𝑉 (𝑦̂𝑖 )‖1
𝑘 𝑘
enhance its discrimination. Therefore, our domain-relevant contrastive
where 𝑉 (⋅) denotes the fixed pre-trained feature extractor and 𝑉𝑖 (⋅) is
regularization 𝐿𝐷𝐶𝑅 can be deduced as:
𝑠th feature map from the feature extractor. 𝑇 is the number of feature
𝐿𝑐𝑥 (𝜙𝑖 , 𝜙𝑝 ) maps. 𝛼𝑠 is the balancing term of the contrastive regularization loss that
𝐿𝐷𝐶𝑅 = ∑𝑁 , (6)
(𝜙𝑖 , 𝜙𝑝 ) + 𝑖 𝑠 relates to the 𝑠th feature map.
𝐿𝑐𝑥 𝑠=1,𝑠≠𝑖,𝑠≠𝑝 𝐿𝑐𝑥 (𝜙 , 𝜙 ) + 𝜎
where the picked parameter is 𝜙𝑝 that has higher confidence scores 3.5.4. Cross entropy loss
than 𝜙𝑖 . The 𝜎 is a constant to avoid situations where the denominator Cross entropy loss 𝐿𝐶𝐸 is employed to train our classifier, which is
becomes zero. defined as:
1 ∑
𝑁
3.5. Loss function 𝐿𝐶𝐸 = − 𝑃 (𝜙𝑖 ) log𝑒 (𝑄(𝜙𝑖 )), (12)
𝑁 𝑖=1
Except for the domain-relevant contrastive regularization 𝐿𝐷𝐶𝑅 , where the 𝑃 (𝜙𝑖 ) and 𝑄(𝜙𝑖 ) are the given probability and the estimated
there are other four loss functions are employed in our experiments, probability of 𝜙𝑖 , respectively.
5
Fig. 7. Visual comparisons with conventional learning-based methods on real-world hazy images from the RTTS dataset (Li et al., 2019b).
4. Experiments 240 × 240 × 3. The training data is augmented by random rotation

and random flip. For the network architecture, we follow the previous
4.1. Implementation details work MSBDN (Dong et al., 2020b) due to its compact architecture and
effective performance. All experiments are conducted on an NVIDIA
4.1.1. Datasets Tesla V100 GPU.
RESIDE (Li et al., 2019b) is a widely-used dataset for single image
dehazing, which consists of five subsets: ITS, OTS, Synthetic Object 4.1.3. Competitors and evaluation metrics
Testing Set (SOTS), RTTS, and Unannotated Real Hazy Images (URHI). Our proposed model is compared with open-source and state-of-
Among them, ITS, OTS, and SOTS are synthesized by adjusting the the-art learning-based algorithms, such as AOD-Net (Li et al., 2017),
values of scattering coefficient and atmospheric light artificially, while GridDehazeNet (GDN) (Liu et al., 2019b), MSBDN (Dong et al., 2020b),
RTTS and URHI are taken in real hazy scenarios directly. The experi- FFA-Net (Qin et al., 2020), AECR-Net (Wu et al., 2021), D4 (Yang et al.,
ments aim to assess the performance of our model trained on synthetic 2022) and DeHamer (Guo et al., 2022). Some image dehazing models
datasets on real hazy images. Thus, we select ITS and OTS to comprise based on domain adaptation, which have been pre-trained or fine-tuned
our training set and regard RTTS and URHI as our test sets. In par- on a large number of real hazy images, are also involved to assess
ticular, we select 6000 pairs of images from ITS and OTS, respectively, our proposed model, such as DAD (Shao et al., 2020) and PSD (Chen
and create our training set that is composed of 12000 pairs of synthetic et al., 2021). The results of the competitors are from existing papers, if
samples. We define each synthetic set as a particular domain, as the available. Otherwise, the results are generated through the pre-trained
average depth errors are different among diverse datasets (Li et al., models provided by their authors.
2019b). Therefore, our training set can be d into two domains. To Due to the absence of paired samples, the commonly-used SSIM and
evaluate the proposed method on real hazy images, RTTS and URHI are peak signal-to-noise ratio (PSNR) fail to be applied for the assessment
adopted in our experiments, where RTTS is composed of 4322 real hazy of real-world images. Thus, we resort to blind evaluation metrics for
images and URHI consists of 4809 ones. We also evaluate our model on providing a quantitative comparison of real hazy images. In particular,
other real hazy images employed by some previous work (Fattal, 2014; a classical assessment indicator (BRISQUE Mittal et al., 2012) and a
He et al., 2010), which are denoted as ESPW in this paper. learning-based picture quality predictor (PaQ-2-PiQ Ying et al., 2020)
are adopted in the following experiments. The lower the values of
4.1.2. Training details BRISQUE, the higher quality of restored images, while the PaQ-2-PiQ
In our experiments, the coefficients 𝜆1 , 𝜆2 , and 𝜆3 of the existing is on the contrary.
loss functions are set to 0.5, 0.1, and 1, respectively, to balance the
value of each loss function (Jo & Sim, 2021; Wu et al., 2021). The 4.2. Comparison with competitors
coefficient 𝜆4 is set to 0.5. The 𝐶1 and 𝐶2 are set to 0.0001 and 0.0009,
respectively (Wang et al., 2004). The 𝜎 is set to 10−7 . The feature 4.2.1. Qualitative comparison
extractor employed for the contrastive regularization loss 𝐿𝐶𝑅 is the In this section, we provide qualitative results to evaluate the per-
frozen pre-trained VGG19, and the features are selected from the first, formance of our proposed model. Figs. 7, 8, and 9 exhibit the visual
third, fifth, ninth, and thirteenth layers of the feature extractor with comparison of both our model and conventional learning-based algo-
the coefficients of 𝛼𝑠 set to 1∕32, 1∕16, 1∕8, 1∕4, and 1, respectively (Wu rithms on RTTS, URHI and ESPW datasets (Li et al., 2019b), respec-
et al., 2021). Our proposed model is implemented based on PyTorch tively. Compared with our model, there are several challenges for the
and is trained by the Adam optimizer with 𝛽1 = 0.9 and 𝛽2 = 0.999. In competitors to deal with the haze. Firstly, the thick haze is difficult to
each iteration, the model samples 2 tasks, and each task consists of 4 be addressed for the competitors, and there is still the residual haze
synthetic hazy images from the same domain. The initial learning rate in the restored images (e.g., Fig. 7, column 7). Secondly, some objects
is set as 0.0002 for the whole network. In addition, the input size is that look like haze are sometimes mishandled (e.g., Fig. 8, column 6).
6
Fig. 8. Visual comparisons with conventional learning-based methods on real-world hazy images from the URHI dataset (Li et al., 2019b).
Fig. 9. Visual comparisons with conventional learning-based methods on real hazy images from the ESPW dataset (Fattal, 2014; He et al., 2010).
Fig. 10. Visual comparisons with domain adaptation-based methods on real hazy images (Li et al., 2019b).
Thirdly, the problem of color distortion (e.g., Fig. 9, column 5) is one adaptation-based methods, as shown in Fig. 10. It can be found that
of the obstacles hindering the acquisition of high-quality images. These although real-world hazy images are inaccessible in training, our model
failure cases show that only based on the internal information extracted has visual effects similar to that of DAD (Shao et al., 2020). In addi-
from the synthetic data is not sufficient to maintain the robustness in tion, the restored images of our model have less residual haze than
real samples. In contrast to the competitors, our proposed model can that of PSD (Chen et al., 2021). The results illustrate that our model
obtain higher-quality results from both global and local perspectives, can obtain competitive performance with domain adaptation-based
where the restored images have less color distortion and more satis- methods (Chen et al., 2021; Shao et al., 2020).
factory visual perception. Our model is improved on MSBDN (Dong
et al., 2020b) but is capable of restoring clear objects with less residual 4.2.2. Quantitative comparison
haze. The results demonstrate that utilizing the internal information We further leverage BRISQUE and PaQ-2-PiQ to evaluate the per-
of real domains contributes to boosting the model performance on formance of our proposed model with state-of-the-art competitors.
real hazy images. Moreover, our model is also compared with domain Table 2 reveals the quantitative results of each participant on RTTS
7
Fig. 11. Ablation study of our proposed model with different settings on the real hazy dataset (Li et al., 2019b).
Table 2
Quantitative comparison of the state-of-the-art dehazing models on real hazy datasets (Li et al., 2019b).
RTTS URHI
Methods Year Real #Param Runtime (s) FLOPS
BRISQUE↓ PaQ-2-PiQ↑ BRISQUE↓ PaQ-2-PiQ↑
Hazy – – 37.011* 66.054 33.531 67.254 – – –
√
DAD 2020 32.727* 67.031 – – 54.59M 0.010 195.926G
√
PSD 2021 25.239* 70.430 – – 33.11M 0.024 211.979G
AOD-Net 2017 35.466 66.435 34.077 67.273 0.002M 0.004 536.371M
GDN 2019 28.086 66.061 27.941 66.585 0.96M 0.014 100.447G
MSBDN 2020 28.743* 66.197 26.617 67.851 31.35M 0.021 194.550G
FFA-Net 2020 30.183 67.110 26.141 67.688 4.68M 0.087 1.348T
AECR-Net 2021 28.594 66.197 25.879 67.946 2.61M 0.028 201.665G
D4 2022 29.536 66.677 27.429 67.588 10.70M 0.032 10.445G
DeHamer 2022 30.986 66.573 28.202 67.739 4.63M 0.066 219.887G
Ours 2022 27.021 67.495 24.987 68.509 31.58M 0.062 198.227G
1
‘‘Real’’ denotes the access of real hazy images during the training stage. ‘‘*’’ represents that the results are obtained from the existing paper (Chen et al., 2021). ‘‘#Param’’ stands
for the number of neural parameters. ‘‘FLOPS’’ refers to the floating-point operations per second.
Table 3 of real hazy images with annotations of object categories as well as

Quantitative comparison of object detection on the RTTS dataset (Li et al., 2019b). bounding boxes. We use different dehazing algorithms to restore and
Methods Real mAP (%) Gain enhance the images of RTTS (Li et al., 2019b) separately for melio-
Hazy – 63.32 – rating the quality of test samples. We leverage YOLOv3 (Redmon &
√
DAD 65.02 +1.70 Farhadi, 2018) to detect objects on generated haze-free results and
√
PSD 65.84 +2.52
AOD-Net 60.45 −2.87
calculate their mean average precision (mAP). Table 3 shows the detec-
GDN 63.59 +0.27 tion accuracy and performance gains of YOLOv3 (Redmon & Farhadi,
MSBDN 65.16 +1.84 2018) on diverse dehazing results. It can be noticed that the mAP
FFA-Net 64.44 +1.12 has a significant decline in the images restored by AOD-Net (Li et al.,
AECR-Net 65.39 +2.07
2017), suggesting that AOD-Net (Li et al., 2017) tends to entail the
D4 63.44 +0.12
DeHamer 64.63 +1.31
performance degradation of the object detection model. Compared with
Ours 65.64 +2.32 remaining learning-based competitors, the mAP value of our method
reaches the top accuracy, which implies that our restored images have
higher perceptual sensitivity. Furthermore, our method is comparable
to the domain adaptation-based methods (Chen et al., 2021; Shao et al.,
and URHI (Li et al., 2019b) datasets. The runtime and FLOPS of each 2020).
method are the computational costs when processing an image with
the size of 480 × 640. Due to the exposure to the URHI dataset during 4.3. Ablation study
training or fine-tuning, DAD (Shao et al., 2020) and PSD (Chen et al.,
2021) are only evaluated on the RTTS dataset. It can be observed that In this section, we conduct comprehensive ablation studies to il-
our proposed method has superior performance against the other con- lustrate the effectiveness of different elements in our proposed model.
ventional learning-based algorithms. Although the recent studies (Guo Fig. 11 and Table 4 exhibit the qualitative and quantitative results
et al., 2022; Yang et al., 2022) have reached state-of-the-art results generated by our proposed model with diverse settings, respectively.
on synthetic hazy images, they suffer from severe performance drops Taking the differences of loss functions into account, we retrain the
on real hazy images (Table 2, line 9 and Table 2, line 10). This MSBDN (Dong et al., 2020b) with our loss functions (denoted as
phenomenon demonstrates the significance of our work on boosting baseline), so that the evaluation interference caused by loss functions
the dehazing performance in practical scenarios. Compared with our can be eliminated. 𝐴𝑁𝐶𝑁𝑁 and 𝐴𝑁𝐶𝐺−𝐶𝑜𝑛𝑣 stand for the adaptation
backbone (Table 2, line 6), our model achieves considerable perfor- network based on conventional CNN and CG-Conv, respectively. 𝐴𝑂𝑚𝑒𝑎𝑛
mance gains on both RTTS and URHI datasets with a small number of and 𝐴𝑂𝐷𝐴𝐴 denote the average and the distance-aware aggregation,
additional computational costs. These results show that the ability of respectively. DCR is the domain-relevant contrastive regularization.
our dehazing model to distill the internal information of samples can It can be observed in Fig. 11 that merely adopting the adaptation
play an important role in improving the cross-domain generalization network and the average aggregator contributes to residual haze in the
ability. Our model can surpass DAD (Shao et al., 2020) and can be restored image. The presented distance-aware aggregator significantly
comparable with PSD (Chen et al., 2021) in both two assessment alleviates the residual-haze effect, leading to high visual perception.
indicators without real-world hazy images for training (Table 2, line The combination of the proposed adaptation network, the distance-
2 and Table 2, line 3). aware aggregator, and the domain-relevant contrastive regularization
can produce clearer and higher-quality images. Table 4 also illustrates
4.2.3. Evaluation on object detection the effectiveness of our proposed elements. Take the effectiveness of our
We evaluate the generalization ability of the dehazing algorithms distance-aware aggregator for example (Table 4, line 3 and Table 4, line
according to the performance improvement of the object detection 4). The images restored with our distance-aware aggregator have lower
model. We adopt the RTTS dataset (Li et al., 2019b), which is composed BRISQUE and PaQ-2-PiQ values compared with the average aggregator,
8
Fig. 12. Visual comparisons on other similar tasks. The inputs of (a) and (d) are downloaded from the Internet, and the inputs of (b) and (c) are from the relevant existing
datasets of nighttime image dehazing (Li et al., 2015; Zhang et al., 2017, 2020a) and underwater image enhancement (Islam et al., 2020).
Table 4
Ablation study of our proposed model with different settings on the URHI dataset (Li et al., 2019b).
Methods Settings
URHI
Adaptation network Aggregator
Baseline DCR
𝐴𝑁𝐶𝑁𝑁 𝐴𝑁𝐶𝐺−𝐶𝑜𝑛𝑣 𝐴𝑂𝑚𝑒𝑎𝑛 𝐴𝑂𝐷𝐴𝐴 BRISQUE↓ PaQ-2-PiQ↑
✓ 27.043 67.277
✓ ✓ ✓ 26.494 67.368
Our model ✓ ✓ ✓ 25.760 67.645
✓ ✓ ✓ 25.574 68.171
✓ ✓ ✓ ✓ 24.987 68.509
Table 5 et al., 2020b) fails to recover the remote objects, and there are some
Quantitative results with different 𝜆4 values on the
pixels that are mishandled (the road in the image generated by MS-
URHI dataset (Li et al., 2019b).
BDN Dong et al., 2020b). In contrast, the restored image of our model
𝜆4 BRISQUE↓ PaQ-2-PiQ↑
has higher visual perception with less distortion. The comparisons
0 25.574 68.171
of the other three cases also demonstrate that our contributions to
0.5 24.987 68.509
1 25.143 67.700
MSBDN (Dong et al., 2020b) not only improve the performance on real
2 25.342 67.612 hazy samples but also enhance the generalization capability in other
similar low-level tasks.
4.6. Limitations and future work

which clarifies that alleviating the adverse effects of outliers contributes
to more powerful dehazing functions on real images. The remaining In this context, we discuss the limitations of our model and provide
ablation studies also demonstrate the effectiveness of other elements in a promising subject for our future work. It can be seen in Table 2
our proposed model. that our dehazing model has a considerable number of parameters,
which makes it limited to be employed in embedded devices, such as
4.4. Discussion some wearable devices in human–robot interaction systems. Thus, in
the future, it is a direction worthy of the effort to design a lightweight
We further study the effects of 𝜆4 on the dehazing performance. In dehazing framework with powerful domain generalization abilities.
our experiments, we set 𝜆4 to 0, 0.5, 1, and 2, respectively, and retrain
our dehazing model with different 𝜆4 values. Table 5 reports the results 5. Conclusions
of each retrained model on the URHI dataset (Li et al., 2019b). It can be
noticed that our model can achieve the top performance when 𝜆4 is set In this work, we propose a domain generalization framework via
to 0.5, while the opposite results are obtained when 𝜆4 is altered to 1 or model-based meta-learning for single image dehazing. By combining
2. We conjecture this is because the balance of loss functions is broken both our adaptation network and distance-aware aggregator with the
by improper values of 𝜆4 , leading to the performance degradation of dehazing network, our model can dig out representative internal infor-
our dehazing model. Therefore, we set 𝜆4 to 0.5 in our experiments for mation from a specific real domain. In addition, we present a domain-
achieving the best performance. relevant contrastive regularization to facilitate external variables to
capture more discriminative information about domains, contributing
4.5. Applications to other low-level tasks to a more powerful dehazing function for the given domain. The ex-
tensive experiments demonstrate that our proposed model outperforms
Apart from single image dehazing, we explore some other low-level the state-of-the-art competitors in real hazy domains.
computer vision tasks, including single image deraining, nighttime im-
age dehazing, underwater image enhancement, and single image glare Declaration of competing interest
removal, to evaluate the performance gains of our model compared
with the backbone MSBDN (Dong et al., 2020b). Fig. 12 shows the The authors declare that they have no known competing finan-
results generated by MSBDN (Dong et al., 2020b) and our model. Take cial interests or personal relationships that could have appeared to
single image deraining for instance. It can be seen that MSBDN (Dong influence the work reported in this paper.
9
Acknowledgments Li, L., Dong, Y., Ren, W., Pan, J., Gao, C., Sang, N., & Yang, M. (2019). Semi-supervised
image dehazing. IEEE Transactions on Image Processing, 29, 2766–2779.
Li, B., Peng, X., Wang, Z., Xu, J., & Feng, D. (2017). AOD-Net: All-in-one dehazing
This work was supported by National Natural Science Foundation
network. In Proceedings of the IEEE international conference on computer vision (pp.
of China (62233005, 62293502), Program of Shanghai Academic Re- 4770–4778).
search Leader, China under Grant 20XD1401300, Sino-German Center Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., & Wang, Z. (2019). Benchmarking
for Research Promotion, China (Grant M-0066), the CNPC Innovation single-image dehazing and beyond. IEEE Transactions on Image Processing, 28(1),
Fund, China under Grant 2021D002-0902, and Shanghai AI Lab, China. 492–505.
Li, Y., Tan, R., & Brown, M. (2015). Nighttime haze removal with glow and multiple
light colors. In Proceedings of the IEEE international conference on computer vision
References (pp. 226–234).
Li, R., Zhang, X., You, S., & Li, Y. (2020). Learning to dehaze from realistic scene with
Cai, B., Xu, X., Jia, K., Qing, C., & Tao, D. (2016). DehazeNet: An end-to-end system a fast physics-based dehazing network. arXiv:2004.08554.
for single image haze removal. IEEE Transactions on Image Processing, 25(11), Lin, X., Ma, L., Liu, W., & Chang, S. (2020). Context-gated convolution. In Proceedings
5187–5198. of the European conference on computer vision (pp. 701–718). Springer.
Chen, R., Gao, N., Vien, N., Ziesche, H., & Neumann, G. (2022). Meta-learning Liu, S., Davison, A., & Johns, E. (2019). Self-supervised generalisation with meta
regrasping strategies for physical-agnostic objects. arXiv preprint arXiv:2205.11110. auxiliary learning. Advances in Neural Information Processing Systems, 32.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for Liu, X., Ma, Y., Shi, Z., & Chen, J. (2019). GridDehazeNet: Attention-based multi-scale
contrastive learning of visual representations. In Proceedings of the international network for image dehazing. In Proceedings of the IEEE/CVF international conference
conference on machine learning (pp. 1597–1607). on computer vision (pp. 7314–7323).
Chen, Z., Wang, Y., Yang, Y., & Liu, D. (2021). PSD: Principled synthetic-to-real Liu, H., Wu, Z., Li, L., Salehkalaibar, S., Chen, J., & Wang, K. (2022). Towards multi-
dehazing guided by physical priors. In Proceedings of the IEEE/CVF conference on domain single image dehazing via test-time training. In Proceedings of the IEEE/CVF
computer vision and pattern recognition (pp. 7180–7189). conference on computer vision and pattern recognition (pp. 5831–5840).
Chi, Z., Wang, Y., Yu, Y., & Tang, J. (2021). Test-time fast adaptation for dynamic scene McCartney, E. (1976). Optics of the atmosphere: Scattering by molecules and particles. New
deblurring via meta-auxiliary learning. In Proceedings of the IEEE/CVF conference on York: John Wiley and Sons.
computer vision and pattern recognition (pp. 9137–9146). Mechrez, R., Talmi, I., & Zelnik-Manor, L. (2018). The contextual loss for image
Dong, Y., Liu, Y., Zhang, H., Chen, S., & Qiao, Y. (2020). FD-GAN: Generative adver- transformation with non-aligned data. In Proceedings of the European conference on
sarial networks with fusion-discriminator for single image dehazing. In Proceedings computer vision (pp. 768–783).
of the AAAI conference on artificial intelligence. Vol. 34 (07), (pp. 10729–10736). Mittal, A., Moorthy, A., & Bovik, A. (2012). No-reference image quality assessment in
Dong, H., Pan, J., Xiang, L., Hu, Z., Zhang, X., Wang, F., & Yang, M. (2020). Multi-scale the spatial domain. IEEE Transactions on Image Processing, 21(12), 4695–4708.
boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF
Narasimhan, S., & Nayar, S. (2000). Chromatic framework for vision in bad weather.
conference on computer vision and pattern recognition (pp. 2157–2167).
In Proceedings of the IEEE conference on computer vision and pattern recognition. Vol.
van Dooren, S., Duhr, P., Amstutz, A., & Onder, C. (2022). Optimal control of real 1 (pp. 598–605).
driving emissions. Control Engineering Practice, 127, Article 105269.
Narasimhan, S., & Nayar, S. (2002). Vision and the atmosphere. International Journal
Fattal, R. (2014). Dehazing using color-lines. ACM Transactions on Graphics, 34(1), 1–14.
of Computer Vision, 48(3), 233–254.
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast
Qin, X., Wang, Z., Bai, Y., Xie, X., & Jia, H. (2020). FFA-Net: Feature fusion attention
adaptation of deep networks. In Proceedings of the international conference on machine
network for single image dehazing. In Proceedings of the AAAI conference on artificial
learning (pp. 1126–1135).
intelligence. Vol. 34. No. 7.
Gao, N., Ziesche, H., Vien, N., Volpp, M., & Neumann, G. (2022). What matters for
Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv
meta-learning vision regression tasks? In Proceedings of the IEEE/CVF conference on
preprint arXiv:1804.02767.
computer vision and pattern recognition (pp. 14776–14786).
Ren, W., Liu, S., Zhang, H., Pan, J., Cao, X., & Yang, M. (2016). Single image dehazing
Garnelo, M., Rosenbaum, D., Maddison, C., Ramalho, T., Saxton, D., Shanahan, M.,
via multi-scale convolutional neural networks. In Proceedings of the European
Teh, Y., Rezende, D., & Eslami, S. (2018). Conditional neural processes. In
conference on computer vision (pp. 154–169). Springer.
Proceedings of the international conference on machine learning (pp. 1704–1713).
Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with
Garnelo, M., Schwarz, J., Rosenbaum, D., Viola, F., Rezende, D., Eslami, S., & Teh, Y.
synthetic data. International Journal of Computer Vision, 126(9), 973–992.
(2018). Neural processes. arXiv:1807.01622.
Shao, Y., Li, L., Ren, W., Gao, C., & Sang, N. (2020). Domain adaptation for image
Gróf, T., Bauer, P., & Watanabe, Y. (2022). Positioning of aircraft relative to unknown
dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern
runway with delayed image data, airdata and inertial measurement fusion. Control
recognition (pp. 2808–2817).
Engineering Practice, 125, Article 105211.
Soh, J., Cho, S., & Cho, N. (2020). Meta-transfer learning for zero-shot super-resolution.
Guo, C., Yan, Q., Anwar, S., Cong, R., Ren, W., & Li, C. (2022). Image dehazing
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
transformer with transmission-aware 3D position embedding. In Proceedings of the
(pp. 3516–3525).
IEEE/CVF conference on computer vision and pattern recognition (pp. 5812–5820).
Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A., & Hardt, M. (2020). Test-time training
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsu-
with self-supervision for generalization under distribution shifts. In Proceedings of
pervised visual representation learning. In Proceedings of the IEEE/CVF conference
the international conference on machine learning (pp. 9229–9248).
on computer vision and pattern recognition (pp. 9729–9738).
He, K., Sun, J., & Tang, X. (2010). Single image haze removal using dark channel prior. Sun, Q., Yen, G., Tang, Y., & Zhao, C. (2022). Learn to adapt for monocular depth
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2341–2353. estimation. arXiv preprint arXiv:2203.14005.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image Tang, Z., Cunha, R., Cabecinhas, D., Hamel, T., & Silvestre, C. (2021). Quadrotor going
recognition. In Proceedings of the IEEE conference on computer vision and pattern through a window and landing: An image-based visual servo control approach.
recognition (pp. 770–778). Control Engineering Practice, 112, Article 104827.
Huisman, M., Van Rijn, J., & Plaat, A. (2021). A survey of deep meta-learning. Artificial Tang, Y., Zhao, C., Wang, J., Zhang, C., Sun, Q., Zheng, W., Du, W., Qian, F., &
Intelligence Review, 54(6), 4483–4541. Kurths, J. (2022). An overview of perception and decision-making in autonomous
Ifqir, S., Combastel, C., Zolghadri, A., Alcalay, G., Goupil, P., & Merlet, S. (2022). Fault systems in the era of learning. IEEE Transactions on Neural Networks and Learning
tolerant multi-sensor data fusion for autonomous navigation in future civil aviation Systems.
operations. Control Engineering Practice, 123, Article 105132. Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality
Islam, M., Xia, Y., & Sattar, J. (2020). Fast underwater image enhancement for assessment: from error visibility to structural similarity. IEEE Transactions on Image
improved visual perception. IEEE Robotics and Automation Letters, 5(2), 3227–3234. Processing, 13(4), 600–612.
Jo, E., & Sim, J. (2021). Multi-scale selective residual learning for non-homogeneous Wu, J., Jin, Z., Liu, A., Yu, L., & Yang, F. (2022). A hybrid deep-Q-network and
dehazing. In Proceedings of the IEEE/CVF conference on computer vision and pattern model predictive control for point stabilization of visual servoing systems. Control
recognition (pp. 507–515). Engineering Practice, 128, Article 105314.
Kaleli, A. (2020). Development of the predictive based control of an autonomous Wu, H., Qu, Y., Lin, S., Zhou, J., Qiao, R., Zhang, Z., Xie, Y., & Ma, L. (2021).
engine cooling system for variable engine operating conditions in SI engines: Contrastive learning for compact single image dehazing. In Proceedings of the
design, modeling and real-time application. Control Engineering Practice, 100, Article IEEE/CVF conference on computer vision and pattern recognition (pp. 10551–10560).
104424. Xu, L., Zhao, D., Yan, Y., Kwong, S., Chen, J., & Duan, L.-Y. (2019). IDeRs: Iterative
Lee, B., Lee, K., Oh, J., & Kweon, I. (2020). CNN-based simultaneous dehazing and dehazing method for single remote sensing image. Information Sciences, 489, 50–62.
depth estimation. In Proceedings of the IEEE international conference on robotics and Yang, Y., Wang, C., Liu, R., Zhang, L., Guo, X., & Tao, D. (2022). Self-augmented
automation (pp. 9722–9728). unpaired image dehazing via density and depth decomposition. In Proceedings of
Lee, S., Son, T., & Kwak, S. (2022). FIFO: Learning fog-invariant features for foggy the IEEE/CVF conference on computer vision and pattern recognition (pp. 2037–2046).
scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision Ye, Z., & Yao, L. (2022). Contrastive conditional neural processes. In Proceedings of the
and pattern recognition (pp. 18911–18921). IEEE/CVF conference on computer vision and pattern recognition (pp. 9687–9696).
10
Ying, Z., Niu, H., Gupta, P., Mahajan, D., Ghadiyaram, D., & Bovik, A. (2020). From Zhang, H., & Patel, V. (2018). Densely connected pyramid dehazing network. In
patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. Proceedings of the IEEE conference on computer vision and pattern recognition (pp.
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 3194–3203).
(pp. 3575–3585). Zhang, C., Wang, J., Yen, G., Zhao, C., Sun, Q., Tang, Y., Qian, F., & Kurths, J. (2020).
Zhan, F., Yu, Y., Cui, K., Zhang, G., Lu, S., Pan, J., Zhang, C., Ma, F., Xie, X., & Miao, C. When autonomous systems meet accuracy and transferability through AI: A survey.
(2021). Unbalanced feature transport for exemplar-based image translation. In Patterns, 1(4), Article 100050.
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition Zhang, P., Zhang, B., Chen, D., Yuan, L., & Wen, F. (2020). Cross-domain correspon-
(pp. 15028–15038). dence learning for exemplar-based image translation. In Proceedings of the IEEE/CVF
Zhang, J., Cao, Y., Fang, S., Kang, Y., & Wen Chen, C. (2017). Fast haze removal conference on computer vision and pattern recognition (pp. 5143–5153).
for nighttime image using maximum reflectance prior. In Proceedings of the IEEE Zhang, H., Zhao, C., & Ding, J. (2022). Online reinforcement learning with passivity-
conference on computer vision and pattern recognition (pp. 7418–7426). based stabilizing term for real time overhead crane control without knowledge of
the system model. Control Engineering Practice, 127, Article 105302.
Zhang, J., Cao, Y., Zha, Z., & Tao, D. (2020). Nighttime dehazing with a synthetic
Zhao, C., Tang, Y., & Sun, Q. (2022). Unsupervised monocular depth estimation in
benchmark. In Proceedings of the 28th ACM international conference on multimedia
highly complex environments. IEEE Transactions on Emerging Topics in Computational
(pp. 2355–2363).
Intelligence, 1–10.
Zhang, T., Fu, Y., Wang, L., & Huang, H. (2019). Hyperspectral image reconstruction us-
Zhao, C., Zhang, Y., Poggi, M., Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y., &
ing deep external and internal learning. In Proceedings of the IEEE/CVF international
Mattoccia, S. (2022). MonoViT: Self-supervised monocular depth estimation with a
conference on computer vision (pp. 8559–8568).
vision transformer. In Proceedings of the international conference on 3D vision.
Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., & Finn, C. (2021). Adaptive
Zheng, Z., Ren, W., Cao, X., Hu, X., Wang, T., Song, F., & Jia, X. (2021). Ultra-high-
risk minimization: Learning to adapt to domain shift. Advances in Neural Information definition image dehazing via multi-guided bilateral learning. In Proceedings of the
Processing Systems, 34. IEEE/CVF conference on computer vision and pattern recognition (pp. 16180–16189).
11

1 s2.0 S0967066123000072 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0967066123000072 Main

Uploaded by

Copyright:

Available Formats

Control Engineering Practice 133 (2023) 105438

Contents lists available at ScienceDirect

Control Engineering Practice

Towards generalization on real domain for single image dehazing via

ARTICLE INFO ABSTRACT

1. Introduction are expensive to be collected in practice. To address this issue, numer-

to suppress the misleading information of outliers. The proposed ag-

• A meta-learning-based domain generalization framework is pro-

Fig. 2. The overview of the proposed framework.

Learning-based approaches require a great deal of paired data to Table 1

Fig. 3. The process of obtaining task-specific parameters.

The architecture of 𝐸𝜔1 (⋅) is displayed in Fig. 3. Since the internal

A commonly-used method to aggregate {𝜑𝑖𝑘 }𝐾 𝑘=1

representative and discriminative internal information of correspond- 𝜙𝑖 = 𝜑 . (2)

In this way, 𝜙𝑖 can capture more representative internal information

Fig. 6. An example to illustrate the domain-relevant contrastive regularization, where

To regress a more discriminative and representative task-specific

4. Experiments 240 × 240 × 3. The training data is augmented by random rotation

Table 3 of real hazy images with annotations of object categories as well as

4.6. Limitations and future work

You might also like