You are on page 1of 13

Unsupervised Deep Adversarial Learning with Dynamic Guidance for Single

Underwater Image Restoration


CHIA-HUNG YEH, National Taiwan Normal University, Taipei, Taiwan
CHIH-HSIANG HUANG, National Sun Yat-Sen University, Kaohsiung, Taiwan
CHU-HAN LIN, National Sun Yat-Sen University, Kaohsiung, Taiwan
MIN-HUI LIN, National Sun Yat-Sen University, Kaohsiung, Taiwan
CHUNG-PING LIU, National Sun Yat-Sen University, Kaohsiung, Taiwan
CHUA-CHIN WANG, National Sun Yat-Sen University, Kaohsiung, Taiwan
LI-WEI KANG*, National Taiwan Normal University, Taipei, Taiwan

Underwater image processing has been recently popular with great potential for better exploring underwater environments. However,
underwater images usually suffer from attenuation, color distortion, and noise from the artificial light sources. Such degradations not only
affect the quality of images, but also limits the ability of related application tasks. Therefore, this paper presents a novel deep model for
single underwater image restoration. To train this model, an unsupervised adversarial learning framework with hybrid loss function is
proposed. More specifically, without needing paired training images, our model integrates two cycle-consistent generative adversarial
network (CycleGAN) structures to form a dual-CycleGAN architecture for simultaneously translating an underwater image into its in-air
(in the atmosphere) version, as well as learning a guidance image for guiding the input underwater image to its target of in-air version. As
a result, the restored underwater image is gradually enhanced iteratively, and thus the guidance is also dynamically adjusted to direct the
updating of the restored image. Experimental results have shown better (or comparable) image restoration effectiveness with lower
computing cost quantitatively and qualitatively of the proposed method, compared with the state-of-the-art approaches.

CCS CONCEPTS • Computing methodologies → Artificial intelligence → Computer vision → Computer vision
problems → Reconstruction

Additional Keywords and Phrases: Convolutional Neural Networks (CNNs), Deep Learning, Generative Adversarial Networks
(GANs), Underwater Image Restoration, Unsupervised Learning.

1 INTRODUCTION
With the growing deficiency of natural resources and the development of global economy, the exploration of underwater
environment has been popular in recent years. Several applications of ocean engineering and research have increasingly
relied on underwater images captured by autonomous underwater vehicles (AUVs), used for exploring, understanding, and
interacting with marine environments. For example, a development and control framework of an underwater vehicle was

*
Corresponding author

This work was supported in part by Ministry of Science and Technology, Taiwan, under the Grants MOST 108-2221-E-003-027-MY3, MOST 108-2218-E-
003-002-, MOST 108-2218-E-110-002-, MOST 109-2218-E-110-007-, MOST 109-2224-E-110-001-, and MOST 109-2218-E-003-002-. This work was also
financially supported by the National Taiwan Normal University (NTNU) within the framework of the Higher Education Sprout Project by the Ministry of
Education (MOE) in Taiwan.
Authors’ addresses: Chia-Hung Yeh, National Taiwan Normal University, Taipei, Taiwan, chyeh@ntnu.edu.tw; Chih-Hsiang Huang, National Sun Yat-Sen
University, Kaohsiung, Taiwan, tommy830725@gmail.com; Chu-Han Lin, National Sun Yat-Sen University, Kaohsiung, Taiwan, owen850612@gmail.com;
Min-Hui Lin, National Sun Yat-Sen University, Kaohsiung, Taiwan, sylvia821120@gmail.com; Chung-Ping Liu, National Sun Yat-Sen University,
Kaohsiung, Taiwan, p1597536123@gmail.com; Chua-Chin Wang, National Sun Yat-Sen University, Kaohsiung, Taiwan, ccwang@ee.nsysu.edu.tw; Li-Wei
Kang, National Taiwan Normal University, Taipei, Taiwan, lwkang@ntnu.edu.tw (corresponding author email).
presented in [1] for grasping marine organisms. In addition, we recently presented a lightweight deep neural network model
for underwater object detection [2]. However, the acquisition of underwater images relying on optical imaging encounters
more challenges than that in the atmosphere. That is, underwater images usually suffer from degeneration due to attenuation,
color distortion, and noise from artificial lighting sources as well as the effects of possibly low-end optical imaging devices.
More specifically, the scattering and absorption, induced by particles in the water, attenuate the direct transmission and
induce surrounding scattered light. The attenuated direct transmission degrades the intensity of the scene and introduces
color distortion, while the surrounding scattered light causes the appearance of the scene to be distorted. Such degenerations
make the quality restoration of underwater images difficult, which seriously affect the related tasks in exploration of
underwater environments, such as object detection and recognition. On the other hand, AUVs (equipped with image
acquisition devices) used to capture underwater images are usually battery-powered with low-complexity implementations
for maximizing the lifetime while minimizing the hardware cost. Thus, it would be beneficial to design a lightweight
architecture for being embedded into an AUV for performing underwater exploration tasks.
Image restoration is basically an ill-posed problem. Several state-of-the-art image restoration techniques relying on
some prior knowledge or assumptions, as well as learning strategies have been presented in the literature. Moreover, with
the rapid development of deep learning algorithms, more and more deep learning-based image restoration techniques have
been presented. For example, we recently presented three deep models, respectively, for single image haze removal [3],
rain streak removal [4], and compressed image reconstruction [5]. Moreover, a multi-task end-to-end deep learning
framework with semantic guidance was recently proposed in [6] to jointly solve the reflection removal from a single image
and semantic segmentation. However, the success of deep learning-based methods usually relies on sufficient and effective
training data. It results in the major challenge in training a deep model for single underwater image restoration (SUIR) that
paired training image samples for underwater images and the corresponding ground truths are rarely available. To restore
the appearance and color information of underwater images, hardware-based solutions [7] have been shown to be effective,
but they may be not applicable to dynamic image acquisition. Most popular recent approaches belong to the type of single
image-based underwater image restoration due to their effectiveness and flexibility. They can be categorized into two kinds
of approaches, i.e., traditional and deep learning-based approaches, respectively, reviewed as follows.

1.1 Traditional Methods


Restoring single underwater images has been known to be ill-posed. In tradition, restoring underwater images relies on
compensating either light scattering or color distortion. A popular approach was to formulate the problem as a single image
haze removal problem. A pioneer study presented in [8] was to enhance the quality of a single underwater image via image
dehazing. In addition, a red channel-based method was proposed in [9] for recovering single underwater images. Moreover,
a method based on blue-green channels dehazing and red channel correction for SUIR was presented in [10]. In addition,
an underwater image enhancement framework relying on dehazing with minimum information loss was proposed in [11].
More related studies can be found from [12] and [13].

1.2 Deep Learning-based Methods


Some deep learning-based SUIR frameworks have been recently presented. For example, some deep learning-based single
image haze removal methods (e.g., [14], [15]) using only training hazy image pairs for network learning were directly
applied to SUIR. On the other hand, some deep learning-based methods apply synthetic image pairs for deep model learning,
such as [16] and [17]. Moreover, a large-scale underwater image enhancement benchmark with reference images, as well
as an underwater image enhancement deep network was proposed in [18]. The reference images were generated by the

2
selected 12 image enhancement methods. However, since it is hard to obtain a dataset consisting of image pairs of
underwater images and the corresponding ground truth images, it is not easy to train a general convolutional neural network
(CNN) model relying on paired training data. To tackle the problem, the GAN (generative adversarial network) architecture
[19] and its extension, CycleGAN [20] have been employed in SUIR. For example, the WaterGAN framework [21]
employs a GAN for generating realistic underwater images from in-air image and depth pairings, used for color correction
of monocular underwater images. In addition, CycleGAN was used in [22] as a distortion model for generating a dataset
of paired images for training an underwater image restoration model. However, artificially generated underwater images
for deep model learning may not well fit true ones, which would result in inadequate underwater image reconstruction.
Furthermore, without synthesizing training image pairs, a weakly supervised color transfer method was presented in [23]
to achieve SUIR relying on CycleGAN. Moreover, a CNN-based framework based on synthesized underwater patches for
network training was recently presented in [24] to learn hierarchical statistical features related to color cast and contrast
degradation for underwater image enhancement.
Existing approaches for SUIR may usually suffer from the three weaknesses in inadequate color correction, insufficient
reconstruction of image details, and higher computational complexity. To tackle these problems, this paper proposes an
end-to-end lightweight deep model for SUIR. To train our deep model without needing pairs of training images, an
unsupervised adversarial learning framework with hybrid loss function is also presented.

1.3 Major Contributions and Novelties of this Paper


The major contributions and novelties of this paper are three-fold: (i) to the best of our knowledge, we are among the first
to propose an unsupervised adversarial deep learning framework with dynamically guidance learning for SUIR; (ii) to
design a deep network for the applications on low-complexity devices, the proposed deep model is relatively lightweight.
However, to keep its image restoration capability, we present a dual-CycleGAN structure-based adversarial learning
framework with dynamically learning a guidance to guide the restoration of a single underwater image. More specifically,
in our framework, one CycleGAN is used to learn a grayscale guidance image, while the other CycleGAN is used to train
a lightweight deep image generator for image restoration. The two CycleGANs are simultaneously trained, where the
dynamically learned guidance image iteratively guides the output of the lightweight deep image generator to achieve
efficient underwater image restoration; and (iii) the proposed hybrid loss function includes more loss types for producing
results with better preserving image content, structure, and details, enhancing contrast, and providing better visual
experience. The rest of this paper is organized as follows. Sec. 2 briefly describes some background knowledge. Sec. 3
presents the proposed deep learning network for SUIR with the problem formulation. In Sec. 4, the proposed unsupervised
adversarial learning framework for deep model learning is addressed. In Sec. 5, the experimental results are demonstrated.
Finally, Sec. 6 concludes this paper.

2 BACKGROUND KNOWLEDGE

2.1 Original CycleGAN


CycleGAN [20] extended from GAN [19] aims at learning to translate an image from a source domain 𝑋 (e.g., underwater
images) to a target domain 𝑌 (e.g., in-air images) in the absence of paired training samples. By giving a set of unpaired
training image samples {𝑥 } ∈ 𝑋 and {𝑦 } ∈ 𝑌, the main goal is to learn a forward generator 𝐺 for mapping 𝑋 to 𝑌, and a
backward generator 𝐺 for mapping 𝑌 to 𝑋 . On the other hand, two adversarial discriminators 𝐷 and 𝐷 are also
employed. The discriminator 𝐷 aims at distinguishing the samples {𝑥 } from the backward generated samples {𝐺 (𝑦 )},

3
Figure 1: The architecture of the proposed deep single underwater image restoration network.

while the discriminator 𝐷 aims at distinguishing the samples {𝑦 } from the forward generated {𝐺 (𝑥 )}. In the original
CycleGAN [20], the forward generator mainly has two loss functions. The first is the adversarial loss, which aims at
matching the distribution of 𝐺 (𝑋) with the distribution in the target domain 𝑌. The second is the cycle consistency loss,
which aims at enforcing 𝐺 𝐺 (𝑋) ≈ 𝑋. For the backward generator, the two loss functions can be also similarly defined.

2.2 Perceptual Quality Assessment


Perceptual quality assessment (QA) is designed for predicting the perceived quality of media. The structural similarity
(SSIM) index [25] is a classical QA metric designed for perceptually measuring the similarity between two images,
expressed by 𝑆𝑆𝐼𝑀(𝑎, 𝑏), where a and b are two image patches. Moreover, since the contrast is a fundamental attribute of
images, playing an important role in human visual perception of image quality. A local patch-based objective QA method
using an adaptive representation of local patch structure, called PCQI (patch-based contrast quality index) was presented
in [26]. Both the SSIM and the PCQI metrics are involved in the loss function of our unsupervised learning model for
SUIR.

3 PROBLEM FORMULATION AND PROPOSED DEEP LEARNING NETWORK FOR SUIR


The main goal of this paper is to design a deep convolutional neural network (denoted by 𝐺 ) to transform an input single
underwater image 𝑥 ∈ 𝑋 to its corresponding in-air image 𝑦 ∈ 𝑌. 𝑋 and 𝑌 denote the collections of underwater images (the
source domain) and in-air images (the target domain), respectively. To design a deep model (𝐺 ) for SUIR with good
efficacy, a simple CNN architecture inspired by the Google Inception V3 network [27] is proposed. As depicted in Figure
1, our simple deep model consists of an input layer, 10 convolutional (Conv.) layers (some of them are in parallel), a
concatenation (Concat) layer, and an output layer. Each input single underwater image (𝑥) is in size of 𝐻 × 𝑊 × 3 in the
RGB (red, green, and blue) color space. Each convolutional layer consists of a convolution operation in kernel size of
3 × 3 or 1 × 1, and a ReLU (rectified linear unit) [28] operation as the activation function (for some convolutional layers).
The concatenation layer is used to concatenate the feature maps derived from the different convolutional layers of the same
output size. The output layer outputs the restored image (𝑦) in size of 𝐻 × 𝑊 × 3 in the RGB color space. Moreover, the
number of output channels (or channel depths) of all the convolutional layers is empirically set to 8 for the good tradeoff
between the model complexity and the image reconstruction performance. To train our deep model 𝐺 for SUIR without
needing training image pairs, we present an unsupervised adversarial learning framework, as described in Sec. 4.

4
Figure 2: Proposed unsupervised adversarial learning framework consisting of dual-CycleGAN for SUIR.

4 PROPOSED UNSUPERVISED ADVERSARIAL LEARNING FRAMEWORK FOR SUIR DEEP MODEL


LEARNING
This paper presents an unsupervised learning framework, called dual-CycleGAN (shown in Figure 2), relying on
adversarial learning. As shown in Figure 2, the top half called the GrayCycleGAN is mainly designed for guidance learning
(to guide an input underwater image toward its in-air version) by producing the enhanced grayscale image of the input
image while the bottom half of Figure 2, called the RGBCycleGAN, is the major CycleGAN designed for producing the
finally restored result. The two CycleGAN models will be jointly trained until the network convergence to obtain the final
generator 𝐺 (Figure 1) for SUIR.

4.1 Proposed RGBCycleGAN

4.1.1 Proposed Adversarial Loss in RGBCycleGAN


Different from the adversarial loss used in the original CycleGAN [20] (adopted by [23] for SUIR), in our RGBCycleGAN
training model (circled by the blue dotted line region in Figure 2), we propose to optimize the adversarial loss with respect
to image color information and image texture information separately. To achieve this, we introduce two discriminators,
𝐷 and 𝐷 , to, respectively, distinguish the color information and the texture information, of the translated samples
𝐺 (𝑥) from those of the real samples y. Therefore, the proposed adversarial loss function ℒ is expressed as:
ℒ (𝐺 , 𝐷 , 𝐷 , 𝑋, 𝑌) = (1)

5
𝜆 𝑙𝑜𝑔 𝐷 𝐶(𝑦) 𝜆 𝑙𝑜𝑔 1 − 𝐷 𝐶 𝐺 (𝑥)
𝔼 ~ ( ) +𝔼 ~ ( ) ,
+𝜆 𝑙𝑜𝑔 𝐷 𝑇(𝑦) +𝜆 𝑙𝑜𝑔 1 − 𝐷 𝑇 𝐺 (𝑥)
where 𝑥~𝑝 (𝑥) and 𝑦~𝑝
(𝑦) denote the data distributions of the samples {𝑥 } and {𝑦 } , respectively. In this
expression, 𝐺 intends to generate images 𝐺 (𝑥), that look like images in the domain 𝑌. The two discriminators 𝐷 and
𝐷 , are defined for, respectively, distinguishing the color information and the texture information, of the translated
samples 𝐺 (𝑥) from those of the real samples 𝑦. 𝐶(𝐼) and 𝑇(𝐼) denote the two operations for extracting the color and the
texture components from the image I, respectively, where 𝑇(𝐼) = 𝐼 − 𝐶(𝐼). Here, just a simple average pooling operation
is used to remove details from the image 𝐼 to keep the color component as 𝐶(𝐼). The weighting coefficients, 𝜆 and 𝜆 ,
for the color and the texture components, respectively, are empirically set to 0.75 and 0.25 in our experiments. The
proposed backward adversarial loss function (for our RGBCycleGAN) can be similarly defined by
ℒ (𝐺 , 𝐷 , 𝐷 , 𝑌, 𝑋). The two discriminators 𝐷 and 𝐷 , are defined for, respectively, distinguishing the color
information and the texture information, of the translated samples 𝐺 (𝑦) from those of the real samples 𝑥.

4.1.2 Proposed Cycle Consistency Loss in RGBCycleGAN


Again, different from the cycle consistency loss (used in [23]) used in the original CycleGAN [20], we introduce the
respective optimizations in color and texture components as well as the SSIM loss ( ℒ ) in the proposed cycle
consistency loss (of our RGBCycleGAN) ℒ , defined as:
ℒ (𝐺 , 𝐺 ) =
𝜆 𝐶 𝐺 𝐺 (𝑥) − 𝐶(𝑥) + 𝜆 𝐶 𝐺 𝐺 (𝑦) − 𝐶(𝑦) +
(2)
𝔼 ~ ( ) +𝔼 ~ ( ) ,
𝜆 ℒ 𝑇 𝐺 𝐺 (𝑥) , 𝑇(𝑥) 𝜆 ℒ 𝑇 𝐺 𝐺 (𝑦) , 𝑇(𝑦)
where the weights 𝜆 and 𝜆 are empirically set to 0.25 and 0.75. Based on the property that the larger the value of
𝑆𝑆𝐼𝑀(𝑎, 𝑏) defined in Sec. 2.2 is, the more similar 𝑎 and 𝑏 are, the SSIM loss (for the target of loss minimization) can be
defined as:
ℒ (𝑥, 𝑦) = 1 − ∑ ∈ , ∈ 𝑆𝑆𝐼𝑀(𝑎, 𝑏), (3)

where 𝑁 is the number of corresponding patch pairs of the images 𝑥 and 𝑦. The main reasons for only using the SSIM loss
for texture component in our consistency loss mainly include: (i) as reported in [29], only using the SSIM loss would tend
to render the color information, and therefore the SSIM loss is only used for texture component reconstruction here; and
(ii) also based on [29], only using SSIM loss would usually exhibit better visual reconstruction results for texture
information than only using ℓ /ℓ losses.

4.1.3 Proposed RGB SSIM Loss for Self-Reconstruction in RGBCycleGAN


Inspired by [23], directly calculating the SSIM loss between the input image 𝑥 and the translated image 𝑦 = 𝐺 (𝑥) would
be also beneficial to preserve the content and the structure of the input image. Therefore, we also included this SSIM loss
for image self-reconstruction calculated by ℒ (𝐺 (𝑥), 𝑥) based on Eq. (3) with respect to each color channel in our
RGBCycleGAN.

4.1.4 Proposed Saturation and Color Balance Loss in RGBCycleGAN


The saturation loss function ℒ for an image 𝐼 in size of 𝐻 × 𝑊 used in our model is defined as:

6
ℒ (𝐼) = 1 − ∑, 𝐼 (𝑖, 𝑗), (4)
×

where 𝐼 is the saturation map of 𝐼, and (𝑖, 𝑗) denotes the pixel coordinate. In addition, the color balance (CB) loss
function for an image 𝐼 in size of 𝐻 × 𝑊 used in our model is defined as:
ℒ (𝐼) = ∑ ∈{ , , } ∑ , 𝐼 (𝑖, 𝑗) − 𝑀 , (5)
×

where 𝐼 denotes the color component of 𝐶 ∈ {𝑅, 𝐺, 𝐵}, and 𝑀 is the pixel mean across the three color channels of 𝐼.

4.1.5 Proposed PCQI Loss in RGBCycleGAN


To enhance the local contrast of the restored image 𝐺 (𝑥) for the input 𝑥, the PCQI loss function is defined as:
ℒ (𝐺 (𝑥), 𝑥) = 𝑒 × ( ( ), )
, (6)

where the PCQI function is defined in [26].

4.1.6 Proposed Grayscale SSIM Loss in RGBCycleGAN


As shown in Figure 2, the SSIM loss calculated on grayscale images is defined by ℒ 𝑦′ ,𝑥 _ . The
image 𝑦′ (obtained by the ITU-R Recommendation BT.601 (a standard RGB to grayscale converter) [30]) is the
grayscale version of 𝑦 = 𝐺 (𝑥). Moreover, the image 𝑥 _ is the guidance image generated by the forward
grayscale conversion CNN 𝐹 (the main module to be trained in the proposed GrayCycleGAN, addressed in Sec. 4.2). That
is, the learned guidance image (𝑥 _ ) generated by the forward CNN 𝐹 of our GrayCycleGAN is used to guide
the grayscale version ( 𝑦′ ) of the restored underwater image generated by the forward generator 𝐺 (𝑥) of our
RGBCycleGAN by minimizing the SSIM loss between them. The calculation of this grayscale SSIM loss forms the linkage
between the RGBCycleGAN and the GrayCycleGAN, which are jointly learned in our training process.

4.1.7 Proposed Total Loss for RGBCycleGAN


Finally, the total loss function ℒ used to train our RGBCycleGAN model for transforming 𝑋 (underwater
image domain) to 𝑌 (in-air image domain) is the linear combination of the above-mentioned loss function items with
weights shown as follows:
ℒ (𝐺 , 𝐺 , 𝐷 , 𝐷 , 𝐷 , 𝐷 , 𝑋, 𝑌)
=𝜆 ×ℒ (𝐺 , 𝐷 , 𝐷 , 𝑋, 𝑌)
+𝜆 ×ℒ (𝐺 , 𝐷 , 𝐷 , 𝑌, 𝑋)
+𝜆 ×ℒ (𝐺 , 𝐺 ) (7)
+𝜆 ×ℒ 𝑌 ,𝑋
+𝜆 ×ℒ (𝐺 (𝑋), 𝑋) + 𝜆 × ℒ 𝐺 (𝑋)
+𝜆 ×ℒ 𝐺 (𝑋) + 𝜆 × ℒ (𝐺 (𝑋), 𝑋),
where the weighting coefficients 𝜆 , 𝜆 , 𝜆 , 𝜆 , 𝜆 , 𝜆 , 𝜆 , and 𝜆 are empirically set to 1, 1, 1, 4, 1, 3, 1, and 2,
respectively. 𝐺 and 𝐺 are the forward and backward generators for mapping 𝑋 to 𝑌, and 𝑌 to 𝑋, respectively. 𝐷 and
𝐷 are defined in Eq. (1). ℒ (𝐺 , 𝐷 , 𝐷 , 𝑌, 𝑋) is the backward adversarial loss function, which can be similarly
defined based on Eq. (1). In addition, 𝑌 and 𝑋 _ , respectively, denote the collection of the grayscale
version (via the standard conversion [30]) of each 𝐺 (𝑥), where 𝑥 is an input underwater image, 𝑥 ∈ 𝑋 and 𝐺 (𝑥) ∈ 𝑌,
and the collection of the grayscale version (via our grayscale conversion CNN 𝐹 , described in Sec. 4.2) of each 𝑥. The

7
loss functions, ℒ ,ℒ ,ℒ ,ℒ ,ℒ , and ℒ , are similarly defined by Eqs. (1), (2), (3), (4), (5), and (6),
respectively.

4.2 Proposed GrayCycleGAN for Guidance Learning


The main goal of the proposed GrayCycleGAN is to learn a forward grayscale conversion CNN to generate the guidance
grayscale image used to guide an input underwater image to its in-air version. Since underwater images usually exhibit
color distortion compared with the corresponding in-air images, we argue that the inherent image structure and details of
the respective grayscale versions of an input underwater image and its restored image should be consistent. Our main idea
is to use the grayscale version (obtained by the proposed grayscale conversion CNN in our GrayCycleGAN, described
below) of the input RGB underwater image to guide the restoration process for producing its in-air version.
As shown in Figure 2, for a forward generated in-air image 𝑦 = 𝐺 (𝑥) of the input underwater image 𝑥 in the proposed
RGBCycleGAN, we directly apply the standard grayscale converter to get its “baseline” grayscale version 𝑦′ . On the
other hand, for obtaining “enhanced” grayscale version of the input RGB underwater image, it is needed to exclude possible
noises, blurring effects, or color distortions within the image while preserving its inherent image structure and details. To
achieve this, we propose a grayscale conversion CNN (denoted by 𝐹 ) in our GrayCycleGAN (circled by the red dotted
line region in Figure 2) for transforming an RGB underwater image to the corresponding enhanced grayscale image. To
train 𝐹 , the GrayCycleGAN mainly consists of a coefficient prediction CNN for predicting the grayscale transformation
coefficients (for initial grayscale conversion), the forward grayscale conversion CNN 𝐹 , and the backward grayscale
conversion CNN 𝐹 .
More specifically, the GrayCycleGAN used to learn the grayscale-based guidance (enhanced grayscale image
𝑥 _ ) is trained with the RGBCycleGAN simultaneously, where the grayscale conversion CNN acts as the
generator (𝐹 ) of the GrayCycleGAN. To train 𝐹 , besides the common forward/backward adversarial loss and cycle
consistency loss, we also involve the hybrid loss function. Therefore, the total loss function ℒ used to train
the GrayCycleGAN is defined as:
ℒ 𝐹 ,𝐹 ,𝐷 ,𝐷 ,𝑋 ,𝑌
_
_
=𝜆 ×ℒ 𝐹 ,𝐷 ,𝑋 ,𝑌
_ _
+𝜆 ×ℒ 𝐹 ,𝐷 ,𝑌 ,𝑋
+𝜆 ×ℒ (𝐹 , 𝐹 ) (8)
_ _
+𝜆 ×ℒ 𝐹 𝑋 ,𝑋
+𝜆 ×ℒ 𝐹 𝑋 ,𝑋
_ _
+𝜆 ×ℒ 𝐹 𝑋 ,
where the weighting coefficients 𝜆 , 𝜆 , 𝜆 , 𝜆 , 𝜆 , and 𝜆 are empirically set to 1, 1, 1, 5, 3 and 1,
respectively. 𝐹 and 𝐹 are the forward and backward generators for mapping 𝑋 to 𝑌 , and 𝑌 to 𝑋 ,
respectively. 𝑋 and 𝑌 denote the source domain (a collection of initially converted grayscale images 𝑥 ) and
the target domain (a collection of enhanced grayscale images 𝑥 _ ), respectively. To collect the training samples
in the domain of 𝑌 , we converted the in-air images from the domain of 𝑌 (the collection of in-air images) to the
corresponding grayscale images via the standard grayscale converter. 𝐷 is the discriminator used to distinguish the
grayscale information of the samples translated by the forward generator 𝐹 𝑋 from those of the real samples in
𝑌 . In addition, 𝐷 is the discriminator to distinguish the grayscale information of the samples translated by the
_ _
backward generator 𝐹 𝑌 from those of the real samples in 𝑋 . ℒ 𝐹 ,𝐷 ,𝑋 ,𝑌 and

8
Input BDRC MIL DIH DHWD WGAN EW Proposed
Figure 3: Qualitative performance comparisons for single underwater image restoration.

_ _
ℒ 𝐹 ,𝐷 ,𝑌 ,𝑋 are the forward and backward adversarial loss functions, respectively, similarly
_ _ _ _ _ _
defined in Eq. (1). ℒ ,ℒ , and ℒ are the cycle consistency loss, SSIM loss, and PCQI loss,
_ _
respectively, similarly defined in Eqs. (2), (3), and (6). Moreover, ℒ is the contrast loss similarly defined by:
_ _
ℒ (𝐼) = ∑ , |𝐼(𝑖, 𝑗) − 𝑀(𝐼)|, (9)
×

where 𝐼(𝑖, 𝑗) is the pixel value of the image 𝐼 in size of 𝐻 × 𝑊 at the position (𝑖, 𝑗), and 𝑀(𝐼) is the pixel mean of 𝐼.
In the overall training process for jointly training our RGBCycleGAN and GrayCycleGAN, the total loss function used
to train the proposed dual CycleGAN model is defined as:

ℒ =ℒ +ℒ , (10)
where ℒ and ℒ are defined in Eqs. (7) and (8), respectively. In the training process, the SSIM loss
is iteratively calculated by ℒ 𝑦′ ,𝑥 _ , as depicted in Figure 2. That is, minimizing this SSIM loss can
be viewed as the linkage between the two CycleGANs. During the overall network training process, the iteratively updated
enhanced grayscale image 𝑥 _ is used as the guidance to guide the restoration of the input underwater image.

5 EXPERIMENTAL RESULTS
To train the proposed deep model for SUIR with unpaired training images, two datasets of different domains are used. One
is the collection of underwater images extracted from the underwater videos presented in [31]. Another is the Cityscapes
dataset [32], used as the in-air domain dataset. In our experiments, we selected 25,000 images from each domain as the
training data and 1,000 images as the testing data. The learning rate is adjusted by a learning rate decay. Our dual-
CycleGAN is trained with 10 steps, which takes about 10 hours to achieve the optimal performance.

9
5.1 Qualitative Results
To qualitatively evaluate the proposed method, the six state-of-the-art methods were used for comparisons. They are
denoted by BDRC (Blue-green channels Dehazing and Red channel Correction) [10], MIL (underwater image
enhancement by dehazing with Minimum Information Loss) [11], DIH (Diving Into Hazelines) [12], DHWD (underwater
image De-scattering and enhancing based on deep leaning-based dehazing and Hybrid Wavelets and Directional filter
banks) [14], WGAN (WaterGAN) [21], and EW (Emerging from Water via color correction) [23]. The latter three methods
[14], [21], [23] were also deep learning-based. Some restored underwater images are shown in Figure 3, where we found
the BDRC [10], MIL [11], DIH [12], DHWD [14], and WGAN [21] methods induce more significant color distortion or
over-enhancement. Compared with the EW method [23], the color information produced by our method is closer to that of
in-air images. Moreover, benefited from the proposed guidance learning and hybrid loss function, our method recovers
richer image details than those recovered by the EW method.

5.2 Quantitative Results


To quantitatively evaluate the performance of our method, the two metrics, including PCQI [26] and NIQE (Natural Image
Quality Evaluator) [33] were used. The NIQE metric is a no-reference image quality score, where the smaller the NIQE
value is, the higher the image quality would usually be. As revealed by Table 1, our method exhibits better or comparable
quantitative performances. Moreover, it can be observed from Table 1 that dehazing-based methods [11], [14] usually
exhibit relatively better quantitative results. The main reason is that the dehazing-based methods usually tend to enhance
the image contrast as well as the calculations of PCQI and NIQE are both based on the contrast. However, over contrast
enhancement would usually result in inadequately visualized results.

Table 1: Average quantitative results on restoration of 30 test underwater images in terms of PCQI and NIQE.

Methods BDRC MIL DIH DHWD WGAN EW Proposed


PCQI 0.8261 1.1301 0.9076 1.0834 0.8345 0.8563 0.9859
NIQE 5.8382 5.5129 5.4993 5.5778 5.8001 6.3977 5.5101

Table 2: Average quantitative results of ablation studies on restoration of 30 test underwater images in terms of PCQI.
Proposed Proposed
Proposed Proposed Proposed Proposed
Removing with
Removing Removing Removing Removing Complete
Methods Color Original
Guidance SSIM Saturation PCQI Proposed
Balance CycleGAN
Learning Loss Loss Loss
Loss Losses
PCQI 0.7358 0.8837 0.884 0.9245 0.9691 0.8656 0.9859

5.3 Ablation Studies


To evaluate the effectiveness of each component in the proposed model, we performed the ablation studies, shown as
follows. First, to evaluate the effectiveness of the learned grayscale guidance, we remove the guidance learning component
from the proposed framework and show the restored image in Figure 4(b). Second, to evaluate the performance of the
proposed hybrid loss function, we took turns removing the proposed SSIM loss, saturation loss, color balance loss, and
PCQI loss, and show the corresponding results in Figure 4(c), 4(d), 4(e), and 4(f), respectively. In addition, to evaluate the

10
(a) (b) (c) (d)

(e) (f) (g) (h)


Figure 4: Ablation studies: (a) the original underwater image; the restored results obtained by the proposed method with removing
the (b) guidance learning process; (c) SSIM loss; (d) saturation loss; (e) color balance loss; (f) PCQI loss; (g) by the original
adversarial loss and cycle consistency loss in the original CycleGAN; and (h) by the complete proposed method.

effectiveness of the proposed adversarial loss and the cycle consistency loss, we replaced them by the original loss functions
used in the original CycleGAN [20] and show the result in Figure 4(g). As revealed by Figure 4, removing any components
in the proposed training process will induce lower quality of restored images. Moreover, Table 2 shows the quantitative
results of the ablation studies corresponding to Figure 4, which also shows all the loss items used in our method have
significant contributions to the overall restoration performance.

5.4 Complexity Analysis


The proposed method was implemented in Python with Tensorflow on a PC equipped with Intel® Core™ i7-8700k
processor, 16 GB memory, and NVIDIA GeForce GTX 1080 GPU. To analyze the computational complexity of our
method, we estimated the run-time with CPU only for restoring 30 underwater images in size of 512 × 512. Table 3 shows
the average processing time per image for the evaluated methods. With the utilization of GPU, the proposed method only
takes 0.22 seconds per image in average. As a result, our method reveals the highest efficiency for SUIR in our evaluated
methods. Moreover, Table 4 shows the numbers of network parameters and the required GFLOPs (giga floating point
operations) for the evaluated methods. Based on Table 4, both the number of network parameters and GFLOPs of our
model are significantly lower than those of the compared methods [21], [23]. As a result, the proposed deep network for
underwater image restoration (shown in Figure 1) is with significantly lighter model complexity and better or comparable
underwater image restoration performance, compared with the state-of-the-art methods.

Table 3: Average running time (in seconds, without GPU acceleration) per image in size of 512×512.

Methods MIL DIH DHWD WGAN EW Proposed


Time 1.351 18.051 0.753 4.036 3.302 0.536

Table 4: Deep model complexity in terms of number of network parameters and GFLOPs on images in size of 512×512.

Methods WGAN EW Proposed


Parameters 57230 K 1676 K 2.6 K
GFLOPS 126.8 33.1 0.67

11
6 CONCLUSION
In this paper, we have proposed a deep single underwater image restoration network, trained by the proposed unsupervised
adversarial dual-CycleGAN model learning framework based on unpaired training images and dynamic guidance learning.
As a result, the presented deep model well restores a single underwater image while preserving most image details and
achieving better color visual experience in low-complexity. Based on the experimental results, our deep model (with
significantly lower complexity) outperforms (or is comparable with) the state-of-the-art methods used for comparisons
quantitatively and qualitatively. For future works, more advanced network pruning techniques as well as deep learning
hardware acceleration architectures may be integrated for possibly further reducing the complexity.

REFERENCES
[1] Yu Wang, Mingxue Cai, Shuo Wang, Xuejian Bai, Rui Wang, and Min Tan. "Development and Control of an Underwater Vehicle–Manipulator System
Propelled by Flexible Flippers for Grasping Marine Organisms." IEEE Transactions on Industrial Electronics 69, no. 4 (2021): 3898-3908.
[2] Chia-Hung Yeh, Chu-Han Lin, Li-Wei Kang, Chih-Hsiang Huang, Min-Hui Lin, Chuan-Yu Chang, and Chua-Chin Wang. "Lightweight deep neural
network for joint learning of underwater object detection and color conversion." IEEE Transactions on Neural Networks and Learning Systems (in
press).
[3] Chia-Hung Yeh, Chih-Hsiang Huang, and Li-Wei Kang. "Multi-scale deep residual learning-based single image haze removal via image
decomposition." IEEE Transactions on Image Processing 29 (2019): 3153-3167.
[4] Chih-Yang Lin, Zhuang Tao, Ai-Sheng Xu, Li-Wei Kang, and Fityanul Akhyar. "Sequential dual attention network for rain streak removal in a single
image." IEEE Transactions on Image Processing 29 (2020): 9250-9265.
[5] Chia-Hung Yeh, Chu-Han Lin, Min-Hui Lin, Li-Wei Kang, Chih-Hsiang Huang, and Mei-Juan Chen. "Deep learning-based compressed image artifacts
reduction based on multi-scale image fusion." Information Fusion 67 (2021): 195-207.
[6] Yunfei Liu, Yu Li, Shaodi You, and Feng Lu. "Semantic guided single image reflection removal." ACM Transactions on Multimedia Computing,
Communications, and Applications (in press).
[7] Tali Treibitz and Yoav Y. Schechner. "Active polarization descattering." IEEE transactions on pattern analysis and machine intelligence 31, no. 3
(2008): 385-399.
[8] John Y. Chiang and Ying-Ching Chen. "Underwater image enhancement by wavelength compensation and dehazing." IEEE transactions on image
processing 21, no. 4 (2011): 1756-1769.
[9] Adrian Galdran, David Pardo, Artzai Picón, and Aitor Alvarez-Gila. "Automatic red-channel underwater image restoration." Journal of Visual
Communication and Image Representation 26 (2015): 132-145.
[10] Chongyi Li, Jichang Quo, Yanwei Pang, Shanji Chen, and Jian Wang. "Single underwater image restoration by blue-green channels dehazing and red
channel correction." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1731-1735. IEEE, 2016.
[11] Chong-Yi Li, Ji-Chang Guo, Run-Min Cong, Yan-Wei Pang, and Bo Wang. "Underwater image enhancement by dehazing with minimum information
loss and histogram distribution prior." IEEE Transactions on Image Processing 25, no. 12 (2016): 5664-5677.
[12] Dana Berman, Tali Treibitz, and Shai Avidan. "Diving into haze-lines: Color restoration of underwater images." In Proc. British Machine Vision
Conference (BMVC), vol. 1, no. 2. 2017.
[13] Codruta O. Ancuti, Cosmin Ancuti, Christophe De Vleeschouwer, and Philippe Bekaert. "Color balance and fusion for underwater image
enhancement." IEEE Transactions on image processing 27, no. 1 (2017): 379-393.
[14] Pan-wang Pan, Fei Yuan, and En Cheng. "Underwater image de-scattering and enhancing using dehazenet and HWD." Journal of Marine Science and
Technology 26, no. 4 (2018): 6.
[15] Xi Yang, Hui Li, Yu-Long Fan, and Rong Chen. "Single image haze removal via region detection network." IEEE Transactions on Multimedia 21, no.
10 (2019): 2545-2560.
[16] Chongyi Li, Saeed Anwar, and Fatih Porikli. "Underwater scene prior inspired deep underwater image and video enhancement." Pattern Recognition
98 (2020): 107038.
[17] Akshay Dudhane, Praful Hambarde, Prashant Patil, and Subrahmanyam Murala. "Deep underwater image restoration and beyond." IEEE Signal
Processing Letters 27 (2020): 675-679.
[18] Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, and Dacheng Tao. "An underwater image enhancement benchmark
dataset and beyond." IEEE Transactions on Image Processing 29 (2019): 4376-4389.
[19] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative
adversarial nets." Advances in neural information processing systems 27 (2014).
[20] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In
Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
[21] Jie Li, Katherine A. Skinner, Ryan M. Eustice, and Matthew Johnson-Roberson. "WaterGAN: Unsupervised generative network to enable real-time
color correction of monocular underwater images." IEEE Robotics and Automation letters 3, no. 1 (2017): 387-394.

12
[22] Cameron Fabbri, Md Jahidul Islam, and Junaed Sattar. "Enhancing underwater imagery using generative adversarial networks." In 2018 IEEE
International Conference on Robotics and Automation (ICRA), pp. 7159-7165. IEEE, 2018.
[23] Chongyi Li, Jichang Guo, and Chunle Guo. "Emerging from water: Underwater image color correction based on weakly supervised color transfer."
IEEE Signal processing letters 25, no. 3 (2018): 323-327.
[24] Yang Wang, Yang Cao, Jing Zhang, Feng Wu, and Zheng-Jun Zha. "Leveraging deep statistics for underwater image enhancement." ACM Transactions
on Multimedia Computing, Communications, and Applications 17, no. 35 (2021): 1–20.
[25] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. "Image quality assessment: from error visibility to structural similarity." IEEE
transactions on image processing 13, no. 4 (2004): 600-612.
[26] Shiqi Wang, Kede Ma, Hojatollah Yeganeh, Zhou Wang, and Weisi Lin. "A patch-structure representation method for quality assessment of contrast
changed images." IEEE Signal Processing Letters 22, no. 12 (2015): 2387-2390.
[27] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. "Rethinking the inception architecture for computer vision."
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016..
[28] Vinod Nair, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." In Proc. Int. Conf. Machine Learning. 2010.
[29] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. "Loss functions for image restoration with neural networks." IEEE Transactions on computational
imaging 3, no. 1 (2016): 47-57.
[30] Recommendation ITU-R BT.601-7, March 2011.
[31] https://www.youtube.com/user/DALLMYD
[32] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
"The cityscapes dataset for semantic urban scene understanding." In Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 3213-3223. 2016.
[33] Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. "Making a “completely blind” image quality analyzer." IEEE Signal processing letters 20, no.
3 (2012): 209-212.

13

You might also like