Professional Documents
Culture Documents
Underwater image processing has been recently popular with great potential for better exploring underwater environments. However,
underwater images usually suffer from attenuation, color distortion, and noise from the artificial light sources. Such degradations not only
affect the quality of images, but also limits the ability of related application tasks. Therefore, this paper presents a novel deep model for
single underwater image restoration. To train this model, an unsupervised adversarial learning framework with hybrid loss function is
proposed. More specifically, without needing paired training images, our model integrates two cycle-consistent generative adversarial
network (CycleGAN) structures to form a dual-CycleGAN architecture for simultaneously translating an underwater image into its in-air
(in the atmosphere) version, as well as learning a guidance image for guiding the input underwater image to its target of in-air version. As
a result, the restored underwater image is gradually enhanced iteratively, and thus the guidance is also dynamically adjusted to direct the
updating of the restored image. Experimental results have shown better (or comparable) image restoration effectiveness with lower
computing cost quantitatively and qualitatively of the proposed method, compared with the state-of-the-art approaches.
CCS CONCEPTS • Computing methodologies → Artificial intelligence → Computer vision → Computer vision
problems → Reconstruction
Additional Keywords and Phrases: Convolutional Neural Networks (CNNs), Deep Learning, Generative Adversarial Networks
(GANs), Underwater Image Restoration, Unsupervised Learning.
1 INTRODUCTION
With the growing deficiency of natural resources and the development of global economy, the exploration of underwater
environment has been popular in recent years. Several applications of ocean engineering and research have increasingly
relied on underwater images captured by autonomous underwater vehicles (AUVs), used for exploring, understanding, and
interacting with marine environments. For example, a development and control framework of an underwater vehicle was
*
Corresponding author
This work was supported in part by Ministry of Science and Technology, Taiwan, under the Grants MOST 108-2221-E-003-027-MY3, MOST 108-2218-E-
003-002-, MOST 108-2218-E-110-002-, MOST 109-2218-E-110-007-, MOST 109-2224-E-110-001-, and MOST 109-2218-E-003-002-. This work was also
financially supported by the National Taiwan Normal University (NTNU) within the framework of the Higher Education Sprout Project by the Ministry of
Education (MOE) in Taiwan.
Authors’ addresses: Chia-Hung Yeh, National Taiwan Normal University, Taipei, Taiwan, chyeh@ntnu.edu.tw; Chih-Hsiang Huang, National Sun Yat-Sen
University, Kaohsiung, Taiwan, tommy830725@gmail.com; Chu-Han Lin, National Sun Yat-Sen University, Kaohsiung, Taiwan, owen850612@gmail.com;
Min-Hui Lin, National Sun Yat-Sen University, Kaohsiung, Taiwan, sylvia821120@gmail.com; Chung-Ping Liu, National Sun Yat-Sen University,
Kaohsiung, Taiwan, p1597536123@gmail.com; Chua-Chin Wang, National Sun Yat-Sen University, Kaohsiung, Taiwan, ccwang@ee.nsysu.edu.tw; Li-Wei
Kang, National Taiwan Normal University, Taipei, Taiwan, lwkang@ntnu.edu.tw (corresponding author email).
presented in [1] for grasping marine organisms. In addition, we recently presented a lightweight deep neural network model
for underwater object detection [2]. However, the acquisition of underwater images relying on optical imaging encounters
more challenges than that in the atmosphere. That is, underwater images usually suffer from degeneration due to attenuation,
color distortion, and noise from artificial lighting sources as well as the effects of possibly low-end optical imaging devices.
More specifically, the scattering and absorption, induced by particles in the water, attenuate the direct transmission and
induce surrounding scattered light. The attenuated direct transmission degrades the intensity of the scene and introduces
color distortion, while the surrounding scattered light causes the appearance of the scene to be distorted. Such degenerations
make the quality restoration of underwater images difficult, which seriously affect the related tasks in exploration of
underwater environments, such as object detection and recognition. On the other hand, AUVs (equipped with image
acquisition devices) used to capture underwater images are usually battery-powered with low-complexity implementations
for maximizing the lifetime while minimizing the hardware cost. Thus, it would be beneficial to design a lightweight
architecture for being embedded into an AUV for performing underwater exploration tasks.
Image restoration is basically an ill-posed problem. Several state-of-the-art image restoration techniques relying on
some prior knowledge or assumptions, as well as learning strategies have been presented in the literature. Moreover, with
the rapid development of deep learning algorithms, more and more deep learning-based image restoration techniques have
been presented. For example, we recently presented three deep models, respectively, for single image haze removal [3],
rain streak removal [4], and compressed image reconstruction [5]. Moreover, a multi-task end-to-end deep learning
framework with semantic guidance was recently proposed in [6] to jointly solve the reflection removal from a single image
and semantic segmentation. However, the success of deep learning-based methods usually relies on sufficient and effective
training data. It results in the major challenge in training a deep model for single underwater image restoration (SUIR) that
paired training image samples for underwater images and the corresponding ground truths are rarely available. To restore
the appearance and color information of underwater images, hardware-based solutions [7] have been shown to be effective,
but they may be not applicable to dynamic image acquisition. Most popular recent approaches belong to the type of single
image-based underwater image restoration due to their effectiveness and flexibility. They can be categorized into two kinds
of approaches, i.e., traditional and deep learning-based approaches, respectively, reviewed as follows.
2
selected 12 image enhancement methods. However, since it is hard to obtain a dataset consisting of image pairs of
underwater images and the corresponding ground truth images, it is not easy to train a general convolutional neural network
(CNN) model relying on paired training data. To tackle the problem, the GAN (generative adversarial network) architecture
[19] and its extension, CycleGAN [20] have been employed in SUIR. For example, the WaterGAN framework [21]
employs a GAN for generating realistic underwater images from in-air image and depth pairings, used for color correction
of monocular underwater images. In addition, CycleGAN was used in [22] as a distortion model for generating a dataset
of paired images for training an underwater image restoration model. However, artificially generated underwater images
for deep model learning may not well fit true ones, which would result in inadequate underwater image reconstruction.
Furthermore, without synthesizing training image pairs, a weakly supervised color transfer method was presented in [23]
to achieve SUIR relying on CycleGAN. Moreover, a CNN-based framework based on synthesized underwater patches for
network training was recently presented in [24] to learn hierarchical statistical features related to color cast and contrast
degradation for underwater image enhancement.
Existing approaches for SUIR may usually suffer from the three weaknesses in inadequate color correction, insufficient
reconstruction of image details, and higher computational complexity. To tackle these problems, this paper proposes an
end-to-end lightweight deep model for SUIR. To train our deep model without needing pairs of training images, an
unsupervised adversarial learning framework with hybrid loss function is also presented.
2 BACKGROUND KNOWLEDGE
3
Figure 1: The architecture of the proposed deep single underwater image restoration network.
while the discriminator 𝐷 aims at distinguishing the samples {𝑦 } from the forward generated {𝐺 (𝑥 )}. In the original
CycleGAN [20], the forward generator mainly has two loss functions. The first is the adversarial loss, which aims at
matching the distribution of 𝐺 (𝑋) with the distribution in the target domain 𝑌. The second is the cycle consistency loss,
which aims at enforcing 𝐺 𝐺 (𝑋) ≈ 𝑋. For the backward generator, the two loss functions can be also similarly defined.
4
Figure 2: Proposed unsupervised adversarial learning framework consisting of dual-CycleGAN for SUIR.
5
𝜆 𝑙𝑜𝑔 𝐷 𝐶(𝑦) 𝜆 𝑙𝑜𝑔 1 − 𝐷 𝐶 𝐺 (𝑥)
𝔼 ~ ( ) +𝔼 ~ ( ) ,
+𝜆 𝑙𝑜𝑔 𝐷 𝑇(𝑦) +𝜆 𝑙𝑜𝑔 1 − 𝐷 𝑇 𝐺 (𝑥)
where 𝑥~𝑝 (𝑥) and 𝑦~𝑝
(𝑦) denote the data distributions of the samples {𝑥 } and {𝑦 } , respectively. In this
expression, 𝐺 intends to generate images 𝐺 (𝑥), that look like images in the domain 𝑌. The two discriminators 𝐷 and
𝐷 , are defined for, respectively, distinguishing the color information and the texture information, of the translated
samples 𝐺 (𝑥) from those of the real samples 𝑦. 𝐶(𝐼) and 𝑇(𝐼) denote the two operations for extracting the color and the
texture components from the image I, respectively, where 𝑇(𝐼) = 𝐼 − 𝐶(𝐼). Here, just a simple average pooling operation
is used to remove details from the image 𝐼 to keep the color component as 𝐶(𝐼). The weighting coefficients, 𝜆 and 𝜆 ,
for the color and the texture components, respectively, are empirically set to 0.75 and 0.25 in our experiments. The
proposed backward adversarial loss function (for our RGBCycleGAN) can be similarly defined by
ℒ (𝐺 , 𝐷 , 𝐷 , 𝑌, 𝑋). The two discriminators 𝐷 and 𝐷 , are defined for, respectively, distinguishing the color
information and the texture information, of the translated samples 𝐺 (𝑦) from those of the real samples 𝑥.
where 𝑁 is the number of corresponding patch pairs of the images 𝑥 and 𝑦. The main reasons for only using the SSIM loss
for texture component in our consistency loss mainly include: (i) as reported in [29], only using the SSIM loss would tend
to render the color information, and therefore the SSIM loss is only used for texture component reconstruction here; and
(ii) also based on [29], only using SSIM loss would usually exhibit better visual reconstruction results for texture
information than only using ℓ /ℓ losses.
6
ℒ (𝐼) = 1 − ∑, 𝐼 (𝑖, 𝑗), (4)
×
where 𝐼 is the saturation map of 𝐼, and (𝑖, 𝑗) denotes the pixel coordinate. In addition, the color balance (CB) loss
function for an image 𝐼 in size of 𝐻 × 𝑊 used in our model is defined as:
ℒ (𝐼) = ∑ ∈{ , , } ∑ , 𝐼 (𝑖, 𝑗) − 𝑀 , (5)
×
where 𝐼 denotes the color component of 𝐶 ∈ {𝑅, 𝐺, 𝐵}, and 𝑀 is the pixel mean across the three color channels of 𝐼.
7
loss functions, ℒ ,ℒ ,ℒ ,ℒ ,ℒ , and ℒ , are similarly defined by Eqs. (1), (2), (3), (4), (5), and (6),
respectively.
8
Input BDRC MIL DIH DHWD WGAN EW Proposed
Figure 3: Qualitative performance comparisons for single underwater image restoration.
_ _
ℒ 𝐹 ,𝐷 ,𝑌 ,𝑋 are the forward and backward adversarial loss functions, respectively, similarly
_ _ _ _ _ _
defined in Eq. (1). ℒ ,ℒ , and ℒ are the cycle consistency loss, SSIM loss, and PCQI loss,
_ _
respectively, similarly defined in Eqs. (2), (3), and (6). Moreover, ℒ is the contrast loss similarly defined by:
_ _
ℒ (𝐼) = ∑ , |𝐼(𝑖, 𝑗) − 𝑀(𝐼)|, (9)
×
where 𝐼(𝑖, 𝑗) is the pixel value of the image 𝐼 in size of 𝐻 × 𝑊 at the position (𝑖, 𝑗), and 𝑀(𝐼) is the pixel mean of 𝐼.
In the overall training process for jointly training our RGBCycleGAN and GrayCycleGAN, the total loss function used
to train the proposed dual CycleGAN model is defined as:
ℒ =ℒ +ℒ , (10)
where ℒ and ℒ are defined in Eqs. (7) and (8), respectively. In the training process, the SSIM loss
is iteratively calculated by ℒ 𝑦′ ,𝑥 _ , as depicted in Figure 2. That is, minimizing this SSIM loss can
be viewed as the linkage between the two CycleGANs. During the overall network training process, the iteratively updated
enhanced grayscale image 𝑥 _ is used as the guidance to guide the restoration of the input underwater image.
5 EXPERIMENTAL RESULTS
To train the proposed deep model for SUIR with unpaired training images, two datasets of different domains are used. One
is the collection of underwater images extracted from the underwater videos presented in [31]. Another is the Cityscapes
dataset [32], used as the in-air domain dataset. In our experiments, we selected 25,000 images from each domain as the
training data and 1,000 images as the testing data. The learning rate is adjusted by a learning rate decay. Our dual-
CycleGAN is trained with 10 steps, which takes about 10 hours to achieve the optimal performance.
9
5.1 Qualitative Results
To qualitatively evaluate the proposed method, the six state-of-the-art methods were used for comparisons. They are
denoted by BDRC (Blue-green channels Dehazing and Red channel Correction) [10], MIL (underwater image
enhancement by dehazing with Minimum Information Loss) [11], DIH (Diving Into Hazelines) [12], DHWD (underwater
image De-scattering and enhancing based on deep leaning-based dehazing and Hybrid Wavelets and Directional filter
banks) [14], WGAN (WaterGAN) [21], and EW (Emerging from Water via color correction) [23]. The latter three methods
[14], [21], [23] were also deep learning-based. Some restored underwater images are shown in Figure 3, where we found
the BDRC [10], MIL [11], DIH [12], DHWD [14], and WGAN [21] methods induce more significant color distortion or
over-enhancement. Compared with the EW method [23], the color information produced by our method is closer to that of
in-air images. Moreover, benefited from the proposed guidance learning and hybrid loss function, our method recovers
richer image details than those recovered by the EW method.
Table 1: Average quantitative results on restoration of 30 test underwater images in terms of PCQI and NIQE.
Table 2: Average quantitative results of ablation studies on restoration of 30 test underwater images in terms of PCQI.
Proposed Proposed
Proposed Proposed Proposed Proposed
Removing with
Removing Removing Removing Removing Complete
Methods Color Original
Guidance SSIM Saturation PCQI Proposed
Balance CycleGAN
Learning Loss Loss Loss
Loss Losses
PCQI 0.7358 0.8837 0.884 0.9245 0.9691 0.8656 0.9859
10
(a) (b) (c) (d)
effectiveness of the proposed adversarial loss and the cycle consistency loss, we replaced them by the original loss functions
used in the original CycleGAN [20] and show the result in Figure 4(g). As revealed by Figure 4, removing any components
in the proposed training process will induce lower quality of restored images. Moreover, Table 2 shows the quantitative
results of the ablation studies corresponding to Figure 4, which also shows all the loss items used in our method have
significant contributions to the overall restoration performance.
Table 3: Average running time (in seconds, without GPU acceleration) per image in size of 512×512.
Table 4: Deep model complexity in terms of number of network parameters and GFLOPs on images in size of 512×512.
11
6 CONCLUSION
In this paper, we have proposed a deep single underwater image restoration network, trained by the proposed unsupervised
adversarial dual-CycleGAN model learning framework based on unpaired training images and dynamic guidance learning.
As a result, the presented deep model well restores a single underwater image while preserving most image details and
achieving better color visual experience in low-complexity. Based on the experimental results, our deep model (with
significantly lower complexity) outperforms (or is comparable with) the state-of-the-art methods used for comparisons
quantitatively and qualitatively. For future works, more advanced network pruning techniques as well as deep learning
hardware acceleration architectures may be integrated for possibly further reducing the complexity.
REFERENCES
[1] Yu Wang, Mingxue Cai, Shuo Wang, Xuejian Bai, Rui Wang, and Min Tan. "Development and Control of an Underwater Vehicle–Manipulator System
Propelled by Flexible Flippers for Grasping Marine Organisms." IEEE Transactions on Industrial Electronics 69, no. 4 (2021): 3898-3908.
[2] Chia-Hung Yeh, Chu-Han Lin, Li-Wei Kang, Chih-Hsiang Huang, Min-Hui Lin, Chuan-Yu Chang, and Chua-Chin Wang. "Lightweight deep neural
network for joint learning of underwater object detection and color conversion." IEEE Transactions on Neural Networks and Learning Systems (in
press).
[3] Chia-Hung Yeh, Chih-Hsiang Huang, and Li-Wei Kang. "Multi-scale deep residual learning-based single image haze removal via image
decomposition." IEEE Transactions on Image Processing 29 (2019): 3153-3167.
[4] Chih-Yang Lin, Zhuang Tao, Ai-Sheng Xu, Li-Wei Kang, and Fityanul Akhyar. "Sequential dual attention network for rain streak removal in a single
image." IEEE Transactions on Image Processing 29 (2020): 9250-9265.
[5] Chia-Hung Yeh, Chu-Han Lin, Min-Hui Lin, Li-Wei Kang, Chih-Hsiang Huang, and Mei-Juan Chen. "Deep learning-based compressed image artifacts
reduction based on multi-scale image fusion." Information Fusion 67 (2021): 195-207.
[6] Yunfei Liu, Yu Li, Shaodi You, and Feng Lu. "Semantic guided single image reflection removal." ACM Transactions on Multimedia Computing,
Communications, and Applications (in press).
[7] Tali Treibitz and Yoav Y. Schechner. "Active polarization descattering." IEEE transactions on pattern analysis and machine intelligence 31, no. 3
(2008): 385-399.
[8] John Y. Chiang and Ying-Ching Chen. "Underwater image enhancement by wavelength compensation and dehazing." IEEE transactions on image
processing 21, no. 4 (2011): 1756-1769.
[9] Adrian Galdran, David Pardo, Artzai Picón, and Aitor Alvarez-Gila. "Automatic red-channel underwater image restoration." Journal of Visual
Communication and Image Representation 26 (2015): 132-145.
[10] Chongyi Li, Jichang Quo, Yanwei Pang, Shanji Chen, and Jian Wang. "Single underwater image restoration by blue-green channels dehazing and red
channel correction." In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1731-1735. IEEE, 2016.
[11] Chong-Yi Li, Ji-Chang Guo, Run-Min Cong, Yan-Wei Pang, and Bo Wang. "Underwater image enhancement by dehazing with minimum information
loss and histogram distribution prior." IEEE Transactions on Image Processing 25, no. 12 (2016): 5664-5677.
[12] Dana Berman, Tali Treibitz, and Shai Avidan. "Diving into haze-lines: Color restoration of underwater images." In Proc. British Machine Vision
Conference (BMVC), vol. 1, no. 2. 2017.
[13] Codruta O. Ancuti, Cosmin Ancuti, Christophe De Vleeschouwer, and Philippe Bekaert. "Color balance and fusion for underwater image
enhancement." IEEE Transactions on image processing 27, no. 1 (2017): 379-393.
[14] Pan-wang Pan, Fei Yuan, and En Cheng. "Underwater image de-scattering and enhancing using dehazenet and HWD." Journal of Marine Science and
Technology 26, no. 4 (2018): 6.
[15] Xi Yang, Hui Li, Yu-Long Fan, and Rong Chen. "Single image haze removal via region detection network." IEEE Transactions on Multimedia 21, no.
10 (2019): 2545-2560.
[16] Chongyi Li, Saeed Anwar, and Fatih Porikli. "Underwater scene prior inspired deep underwater image and video enhancement." Pattern Recognition
98 (2020): 107038.
[17] Akshay Dudhane, Praful Hambarde, Prashant Patil, and Subrahmanyam Murala. "Deep underwater image restoration and beyond." IEEE Signal
Processing Letters 27 (2020): 675-679.
[18] Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, and Dacheng Tao. "An underwater image enhancement benchmark
dataset and beyond." IEEE Transactions on Image Processing 29 (2019): 4376-4389.
[19] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. "Generative
adversarial nets." Advances in neural information processing systems 27 (2014).
[20] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. "Unpaired image-to-image translation using cycle-consistent adversarial networks." In
Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
[21] Jie Li, Katherine A. Skinner, Ryan M. Eustice, and Matthew Johnson-Roberson. "WaterGAN: Unsupervised generative network to enable real-time
color correction of monocular underwater images." IEEE Robotics and Automation letters 3, no. 1 (2017): 387-394.
12
[22] Cameron Fabbri, Md Jahidul Islam, and Junaed Sattar. "Enhancing underwater imagery using generative adversarial networks." In 2018 IEEE
International Conference on Robotics and Automation (ICRA), pp. 7159-7165. IEEE, 2018.
[23] Chongyi Li, Jichang Guo, and Chunle Guo. "Emerging from water: Underwater image color correction based on weakly supervised color transfer."
IEEE Signal processing letters 25, no. 3 (2018): 323-327.
[24] Yang Wang, Yang Cao, Jing Zhang, Feng Wu, and Zheng-Jun Zha. "Leveraging deep statistics for underwater image enhancement." ACM Transactions
on Multimedia Computing, Communications, and Applications 17, no. 35 (2021): 1–20.
[25] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. "Image quality assessment: from error visibility to structural similarity." IEEE
transactions on image processing 13, no. 4 (2004): 600-612.
[26] Shiqi Wang, Kede Ma, Hojatollah Yeganeh, Zhou Wang, and Weisi Lin. "A patch-structure representation method for quality assessment of contrast
changed images." IEEE Signal Processing Letters 22, no. 12 (2015): 2387-2390.
[27] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. "Rethinking the inception architecture for computer vision."
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826. 2016..
[28] Vinod Nair, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." In Proc. Int. Conf. Machine Learning. 2010.
[29] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. "Loss functions for image restoration with neural networks." IEEE Transactions on computational
imaging 3, no. 1 (2016): 47-57.
[30] Recommendation ITU-R BT.601-7, March 2011.
[31] https://www.youtube.com/user/DALLMYD
[32] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
"The cityscapes dataset for semantic urban scene understanding." In Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 3213-3223. 2016.
[33] Anish Mittal, Rajiv Soundararajan, and Alan C. Bovik. "Making a “completely blind” image quality analyzer." IEEE Signal processing letters 20, no.
3 (2012): 209-212.
13