Professional Documents
Culture Documents
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Adherent mist and raindrop removal from a single image using attentive
convolutional network
Da He a, Xiaoyu Shang a, Jiajia Luo b,c,⇑
a
University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China
b
Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, China
c
Biomedical Engineering Department, Peking University, Beijing 100191, China
a r t i c l e i n f o a b s t r a c t
Article history: Temperature-difference-induced mist and environmentally-induced raindrops adhering to glass products
Received 25 November 2021 such as windshield and camera lens, can often block vision and severely degrade the image seen through
Revised 27 June 2022 the lens. Despite posing a challenge to various vision systems, including autonomous driving and security
Accepted 12 July 2022
surveillance, this problem has not received sufficient attention from researchers. In this work, we discuss
Available online 16 July 2022
the image degradation caused by adherent mist and raindrops. An attentive convolutional network is
designed to visually remove the adherent mist and raindrops from a single image. Considering the vari-
Keywords:
ations in coexistence and reginal characteristics of adherent mist and raindrops, we propose
Adherent mist removal
Raindrop removal
interpolation-based pyramid attention blocks to perceive spatial information at different scales without
Image restoration rigid priors. Experiments show that the proposed method can improve the visibility of severely degraded
Attention mechanism images in real-world scenarios, both qualitatively and quantitatively. Further application experiments
Convolutional neural network demonstrate that this practical problem is critical to high-level vision situations.
Ó 2022 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.neucom.2022.07.032
0925-2312/Ó 2022 Elsevier B.V. All rights reserved.
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
Fig. 1. Examples of adherent raindrops and/or mist. (a) Raindrops adhering to a glass before the camera [1]; (b) Inhomogeneous mist adhering to a windshield when people
sitting in the car; (c) Mist and raindrops simultaneously adhering to a camera lens.
Although disturbances from adherent mist and raindrops are proposed IPA contributes to handling images degraded by
widely observed in many scenarios, and visibility can be signifi- regional visibility interferences such as haze, adherent mist,
cantly deteriorated for algorithms as well as individuals, limited and raindrops. As a result, our network shows high restoration
researches on these natural phenomena have been conducted in performance for the proposed task, as well as the tasks of
the field of computer vision. Particularly, there seems to be no removing conventional haze and only raindrops.
studies on adherent mist.
Admittedly, both adherent mist and raindrops on windows can 2. Related work
be cleaned off by defrosters and windshield wipers. However,
algorithm-based solutions and attention to the proposed degrada- 2.1. Attention mechanism
tion are still in demand in many real-world scenarios. For example,
self-driving system should be able to deal with the degradation in Instead of treating all features equally, which is inefficient or
visibility whether or not it can switch on the defrosters and wipers even harmful to the convergence of deep neural networks, visual
in a timely fashion. The car’s drive recorder app is expected to work attention mechanism usually assigns different attention weights
when other hardware stops. In addition, low-cost vision devices to different features. Important features for specific tasks desire
like monitoring cameras do not have defrosters or wipers. strong attention. Common attention mechanisms for CNNs can
In this study, we aim to remove the adherent mist and rain- be divided into channel-wise attention and spatial attention. The
drops from a single impaired image to obtain a clean image. Based former has achieved great success in image classification [6] and
on convolutional neural networks (CNNs), the proposed approach was then introduced to image restoration tasks [7]. Based on the
can automatically visually remove raindrops and mist to varying experiments for de-noising, de-mosaicing, artifacts removing and
degrees. It can also improve the effectiveness of high-level vision image super-resolution, Zhang et al. demonstrated the importance
algorithms in severe weather. of both attention mechanisms for a wide range of low-level vision
Because raindrops and mist can partially or even completely tasks [8].
block some objects, it is challenging to reproduce a purely clean
image. To solve the problem, we apply the basic blocks in [5] and 2.2. Rain streak removal
advanced attention mechanisms to construct our network. In addi-
tion to the typical convolutional layers, local residual learning, Most visibility enhancement algorithms related to rain weather
channel-wise attention, and spatial attention are applied to extract focused on rain streaks [9]. However, the shapes and physical
features. Long-distance shortcuts are also used, forming different effects of raindrops differ greatly from those of rain streaks. There-
feature stages from corresponding layer groups before feature fore, these streak removal articles may not be directly comparable
fusion. To strengthen the spatial attention facing the variations of to problems involving the removal of raindrops [10].
adherent mist and raindrops, an interpolation-based pyramid
attention (IPA) block is applied to every group. The IPA block first 2.3. Raindrop removal
zooms the features to various sizes then perceives spatial attention
from these scaled spaces. Several studies on raindrop detection have been undertaken.
For our experiment, we used a dataset containing degraded and Kurihata et al. [11] utilize principal component analysis to learn
clean image pairs. Unlike in purely synthetic datasets, the adherent the characteristics of raindrops. Ito et al. [12] use the maximally
mist and raindrops in our dataset physically exist and could there- stable extremal regions to detect raindrop candidates. You et al.
fore model real-world scenarios. The proposed method can [13] pay attention to both raindrop detection and removal in
improve the visibility of degraded images and outperforms other videos. They notice that the temporal change of intensity of rain-
advanced networks. Moreover, it still exhibits the state-of-the-art drop pixels is different from that of non-raindrop pixels, which
performance when applied to the tasks of removing traditional does not apply to the single image case. Eigen et al. [14] may have
atmospheric haze or purely raindrops. been the first to address the problem of single image raindrop
In summary, our main contributions include the following: removal by training a shallow CNN with paired data. Unfortu-
nately, the limited capacity of the CNN model restricts its perfor-
1) We address the rarely studied visual interference problem mance, especially for large and dense raindrops.
caused by adherent mist and raindrops. Without appropriate Qian et al.’s work (DeRaindrop) [1] might have been the first
solutions, this problem limits the extensive use of systems practical solution for the problme of raindrop removal. They build
based on outdoor cameras. a generative adversarial network (GAN) [15] to generate raindrop-
2) We use an experimental-based dataset that contains 1560 free images. Besides the GAN, DeRaindrop includes a recurrent net-
image pairs, focusing on the coexistence of adherent mist and work to gradually locate raindrops in the input image as a spatial
raindrops. attention. The performance is related to this raindrop mask, and
3) An end-to-end restoration network is applied to remove its ground truth is calculated by the difference between the
adherent mist and raindrops without hand-crafted priors. The degraded image and the clean image with a hand-crafted threshold
179
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
of 30. However, it is difficult to apply this manual threshold to network backbone, which is denoted as the baseline. The baseline
images with adherent mist, which contain complex pixel-value contains 114 basic blocks, uniformly distributed in six groups.
variations. Quan et al. [16] propose shape-driven attention to the There is a long-distance shortcut in each group, which forms the
raindrop region, which avoids the rigid threshold but introduces residual learning mechanism for every 19 basic blocks. The output
another strong prior about raindrops’ shapes. This prior is also feature maps of each group are then fused. A global residual short-
inapplicable to raindrops with adherent mist, as the edges in the cut is applied so that the entire network learns the transformation
image may be more severely damaged. By contrast, methods with- residuals.
out hard priors might be more useful for comprehensive real-world Due to the characteristics of adherent raindrops, Qian et al. [1]
situations [17]. and Quan et al. [16] designs prior-based attention mechanisms to
enhance the attention performance for raindrop regions. Combined
2.4. De-hazing with adherent mist, the additional attention becomes more impor-
tant, while those priors hardly hold. Therefore, to further improve
Few studies have primarily focused on adherent mist. Some the perception for adherent mist and raindrops, we propose and
similar studies mainly concentrate on atmospheric haze (or fog) insert an IPA block into each group in the baseline model to imple-
[2,3,18–20]. They both have the scattering effect, which results in ment the final network (denoted as baseline + IPA), as shown in
a ‘‘white” color and degrades the original information. However, Fig. 2(b).
compared to atmospheric haze, adherent mist is more likely to
be inhomogeneous as it is related to the specific design of the 3.2. Interpolation-based pyramid attention
equipment, such as which direction the warm steam comes from
and how the window looks (e.g., mist adhering to the windshield Adherent mist and raindrops display patterns that are very dif-
near the driver’s seat in Fig. 1(b)). ferent from common human-made objects (such as buildings and
Single image de-hazing can be explored using some priors-based chairs) and widely-studied noises (such as Gaussian noise and rain
methods. He et al. [20] propose a dark channel prior to discuss the streaks). Artificial objects usually have rigid shapes. The convolu-
local minimum of the dark channel that varies between haze and tional kernels are sensitive to their low-level edges and high-
haze-free images. Yuan et al. [21] unify several existing priors and level shapes. Conventional noises are always discretely distributed
design a confidence prior to better handle outliers and noises. in the image, and most background information remains. However,
Moreover, some learning-based methods have been proposed. adherent mist has non-rigid edges and attenuates the sharpness of
Cai et al. [18] introduce an end-to-end CNN to estimate the trans- adherent raindrops in images. The adherent raindrops, in turn,
missions from images with haze. Ren et al. [3] propose a multi- aggravate the visual inhomogeneity. Complex refraction and scat-
scale gated fusion network to improve performance. The smoothed tering occurs when adherent mist and raindrops coexist, causing
dilation technique is adopted by Chen et al. [2] to remove gridding much more severe visual degradation than general atmospheric
artifacts. Qin et al. [5] focus on the utilization of feature fusion and haze and purely raindrops.
achieve better de-hazing performance than previous and even Therefore, additional techniques to help the CNN extract fea-
some newer methods with the notable margin. tures from the non-rigid and severe degradation would be useful.
However, in our work, the superposition of adherent mist and Unlike other pyramid attention mechanisms that fuse different lay-
raindrops results in complex local features, limiting general de- ers [22] or adopt different convolutional kernels for the same layer
hazing algorithms using transmission estimation. It is also difficult [23–25], our IPA generates different scaling features by bilinear
to manually input an attention map similar to those in [1,16]. interpolations, as shown in Fig. 2(d). Specifically, for feature maps
Therefore, the proposed network without explicit priors is from the same layer, IPA first amplifies or shrinks them by interpo-
designed to effectively utilize features and restore images for the lation. Two convolutional layers are successively applied at each
real-world scenarios. scale to obtain three attention maps with different scales. Finally,
attention maps are interpolated back to the original lateral size
3. Method and added together. The final spatial attention map is then adopted
to weight the original feature map by pixel-wise multiplication.
3.1. Network architecture Like other pyramid attention techniques, IPA learns spatial
attention based on features with different scales, enabling large
The pipeline of our method is shown in Fig. 2. We organize the receptive fields and flexible perception for patterns of various
basic blocks and feature attention blocks from [5] to build our sizes. Moreover, the interpolation operation outperforms
Fig. 2. Pipeline of the proposed method. (a) Diagram of the dataset acquisition; (b) Overview of the applied convolutional network; (c) Some legends; (d) The transformation
of feature maps in the IPA block.
180
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
drops and mist are random. Moreover, the distance between the Method PSNR SSIM
camera and the glass panel is also a random value from 0.5 cm Degraded Image 17.84 0.6048
to 4.0 cm. Thus the shape of waterdrops could appear differently
Chen et al. [2] + Qian et al. [1] 15.12 0.5919
in the image due to camera focal length. A Canon EOS 800D camera Qin et al. [5] + Qian et al. [1] 17.57 0.6302
and the integrated camera of the Redmi K20 Pro samrtphone are Chen et al. [2] + Quan et al. [16] 14.75 0.5670
used for taking pictures. Qin et al. [5] + Quan et al. [16] 17.30 0.6100
We alleviate two dataset issues in [1] during our preparation: Qian et al. [1] + Chen et al. [2] 16.30 0.6167
the observation of unexpected reflection images on the glass Qian et al. [1] + Qin et al. [5] 18.83 0.6383
plates, and the fact that many samples in [1] were acquired on Quan et al. [16] + Chen et al. [2] 15.96 0.5817
Quan et al. [16] + Qin et al. [5] 18.48 0.6114
sunny days.
Finally, all the collected images are resized with bicubic inter- Re-trained Qian et al. [1] 18.89 0.7012
Re-trained Chen et al. [2] 22.88 0.7968
polation to the shape of 640 480 to form the dataset. There are
Re-trained Qin et al. [5] 23.45 0.8056
1248 image pairs for training, 156 pairs for validation, and 156 Re-trained Quan et al. [16] 24.12 0.8089
pairs for testing. All the following evaluation results are based on
Ours 24.66 0.8293
images in the testing set.
181
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
worse than the degraded images (e.g., Chen et al. + Qian et al. and relatively severe degradation (e.g., the other three samples in
results in Table 1). The four re-trained models generally show Fig. 3). The fact that the color of the cement floor in the fourth sam-
effectiveness, especially for re-trained Chen et al., re-trained Qin ple in Fig. 3 is a similar color to the mist does not affect our meth-
et al. and re-trained Quan et al., but our method shows the best od’s performance.
performance. Therefore, the severe degradation of visibility from adherent
In realistic scenes, the combination of adherent mist and rain- mist and raindrops is observed. Unlike the well-studied restoration
drops might severely impair the image, blocking many objects problems of de-noising or motion blur removal, the new task above
from the subjective view. Examples in Fig. 3 exhibit the poor shoot- is much more complicated, and it is hard to estimate the degrada-
ing environments and the restoration results from several of the tion kernel. Simply re-training restoration networks leaves many
above methods. Due to space limitations, we present only five rel- artifacts. Our method outperforms the compared advanced net-
atively high-performance methods in Fig. 3 as the compared works, quantitatively as well as intuitively.
methods.
As shown in Fig. 3, some objects are not distinguishable in the 4.3. Extended experiments
degraded image. For example, in the first row of Fig. 3, the decora-
tive text on the pink wall is invisible in the degraded image. By Besides the newly proposed vision task described above, we
contrast, most words can be distinguished in the restoration result also extend our network to other two conventional problems to
from our method. A similar phenomenon is shown in the second show its broad applicability. Specifically, the proposed network is
row. The key piece of information in the captured image, the house re-trained for atmospheric de-hazing and purely adherent raindrop
number, is recovered by our method. For the third sample, our removal.
result’s general color is intuitively closest to that in the ground
truth. The example in the fourth row can be regarded as a traffic 4.3.1. De-hazing
control application. The camera in the parking lot is severely influ- Without any modification to the architecture, we also train our
enced by the poor weather conditions and takes degraded photos network using the RESIDE dataset, a widely used de-hazing bench-
as shown in the picture. In this case, high-level algorithms may mark. Following the same operations as previous studies, we use
not recognize the plate number and guards cannot easily confirm the indoor set containing over 13 K image pairs for training and
the bus position. However, our method can recover the plate num- the indoor synthetic objective testing set (SOTS), including 500
bers on the buses and determine the condition of the parking lot at pairs for testing. The testing results are listed in Table 2. For fair-
a glance. ness and convenience, the performance values of other methods
The comparison reveals that the proposed method can obtain are cited from [5]. In addition, three samples from the testing set
relatively cleaner images and fewer artifacts than the other meth- are shown in Fig. 4 for qualitative comparison. Only our method
ods; this is also proved through PSNR/SSIM values. In addition, the and the two relatively new advanced approaches, Chen et al. [2]
combination method (i.e., Qian et al. [1] + Qin et al. [5]) in Fig. 3 and Qin et al. [5], are presented.
shows a tiny improvement. Our method also displayed robust per- As shown in Fig. 4, both the previous works and our method
formance for various conditions, and is able to handle relatively achieve nearly perfect performance. However, as shown in Table 2,
weak interference with visibility (e.g., the third sample in Fig. 3) the proposed method outperforms other methods in the PSNR and
Fig. 3. Intuitive comparisons among a few restoration methods. From left to right: ground truth, degraded image, Qian et al. + Qin et al. [1,5], re-trained Chen et al. [2], re-
trained Qin et al. [5], re-trained Quan et al. [16] and our method. Each row corresponds to the same sample. The values below images (except ground truth images) indicate
the corresponding values of PSNR / SSIM. The details of the images are best viewed by zooming in.
182
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
SSIM metrics. Therefore, the proposed method could be applied to 4.4. Ablation study
handle the conventional de-hazing problem with appropriate
training. Our network’s architecture can be divided into two parts: the
baseline and the IPA blocks. Therefore, we investigate the influence
4.3.2. Sole raindrop removal of the existence of IPA blocks concerning the above problems. In
DeRaindrop (Qian et al. [1]) is a state-of-the-art approach for addition, we compare our IPA with a convolution-based pyramid
handling adherent raindrops. A dataset with purely raindrop attention (CPA) block and a pyramid pooling (PP) feature fusion
degraded image and clean image pairs was also published by Qian block [33]. CPA is implemented by simply removing interpolations
et al. [1]. Therefore, we also try our method using that dataset to from IPA and modifying convolution parameters for rescaling.
verify our approach’s feasibility when only handling adherent rain- Moreover, baseline + PP is achieved by replacing the entire IPA
drops. The dataset contains 861 image pairs for training and 58 block with the pyramid pooling in Zhang et al. [33] and slightly
image pairs for quantitative evaluation. We still do not modify modifying the number of channels. The results of the ablation
the architecture of our method. The quantitative performance is study are shown in Table 4.
shown in Table 3. In addition to [1], a state-of-the-art method for As shown in Table 4, the proposed IPA block outperforms the
two existing pyramid architectures in the three restoration tasks,
which is reasonable because the IPA block can combine multi-
Table 3
Quantitative comparisons of methods on the test set in [1] for solely adherent
scale learning and the low-frequency attention mechanism. Table 4
raindrop removal. also provides quantitative evidence of the contribution of the IPA
block to the performance improvement based on the baseline
Method PSNR SSIM
model in the three experiments. Moreover, although higher perfor-
Eigen et al. [14] 28.59 0.6726 mance can be obtained, IPA does not add much computational
Isola et al. [32] 30.14 0.8299
Qian et al. [1] 31.57 0.9023
complexity as six IPA blocks only add 0.01 M parameters (0.11%)
Quan et al. [16] 31.44 0.9263 and 0.97G FLOPs (0.19%) to the model. By contrast, PP adds
Ours 31.33 0.9297
0.24 M parameters (2.71%) and 13.56G FLOPs (2.69%) based on
the baseline but with less improvement.
183
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
Fig. 5. Qualitative comparisons for three raindrop removal methods. From top to bottom: ground truth, image with raindrops, Qian et al. [1], Quan et al. [16], and our method.
Each column corresponds to the same sample. Artifacts are best compared by zooming in.
Table 4
Quantitative results and complexity analysis of the ablation study for three tasks.
4.5. Visualization of the attention map both CPA and our IPA indicate the left half of the image is more
severely degraded than the right half. However, CPA can not distin-
To better demonstrate the difference between our IPA with guish between degraded regions with or without raindrops. In
other attention mechanisms, we visualize the attention maps of other words, the small region with mist only and the small region
three attention mechanisms as follows. Specifically, in addition with overlapping mist and raindrops obtain similar attention
to our IPA, we obtained the ideal attention map (ground truth of intensity by CPA. In contrast, the attention map by IPA can well dis-
the predicted attention map) in [1] with a threshold of 30, which tinguish three different kinds of regions: low intensity for clean
is specially designed for raindrop removal tasks. We also analyze regions, medium intensity for regions degraded by mist only, and
the convolution-based pyramid attention, denoted as CPA before, high intensity for regions degraded more severely. Thus, the above
which may be of general interest for all restoration tasks. Two visu- observations demonstrate the superiority of the proposed IPA for
alized examples can be found in Fig. 6. the proposed task.
As shown in Fig. 6, the ideal attention maps of Qian et al. [1] are
clean and can capture some large raindrop regions. However, such 4.6. Application experiment
a hard hand-crafted prior attention shows low performance for
regions with monochromic backgrounds. According to the first In addition to the above quantitative and qualitative evaluation
example, we can easily find that raindrops in the ‘‘road” and ‘‘lawn” results, the effectiveness of our method is also validated in practi-
regions are not distinguished by the attention map of [1]. In con- cal applications. Panoptic segmentation is a newly proposed seg-
trast, the other two adaptive attention maps do not show this defi- mentation task [34] that tries to label all the image pixels. It is
ciency. This phenomenon validates our discussion in the extended an essential high-level task in many areas, including autonomous
experiments. cars. The segmentation results based on ground truth images,
Although the attention maps from CPA and our IPA both focus degraded images, and restoration images are shown in Fig. 7.
on severely degraded regions with various backgrounds, our Specifically, we applied the panoptic segmentation algorithm,
method shows better and finer activation. In the second example, R101-FPN [35] from Detectron2 Model Zoo ( https://
184
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
Fig. 6. Visualization of three attention mechanisms. From left to right: clean image (ground truth), degraded image, ideal attention map of Qian et al. [1], adaptively predicted
attention map of baseline + CPA model, and adaptively predicted attention map of our model. The first row corresponds to an example with little adherent mist but obvious
raindrops, while the second row shows another example with obvious mist and raindrops. All attention maps share the same colorbar.
Fig. 7. Restoration results from different methods and their panoptic segmentation results. From left to right: ground truth, degraded images, Qian et al. + Qin et al. [1,5], re-
trained Qian et al. [1], re-trained Chen et al. [2], re-trained Qin et al. [5], and our method. Each two rows correspond to the same sample. The panoptic segmentation result is
attached below the corresponding image. The details of the images are best viewed by zooming in.
github.com/facebookresearch/detectron2), to segment these is magnified because it must jointly handle adherent mist and rain-
images. drops, but our method handles it well. Specifically, the challenges
As shown in Fig. 7, panoptic segmentation might make it easy to are as follows.
show the influence of adherent mist and raindrops from the per-
spective of vision algorithms. When observing the degraded image 1) The image degradation is severe. By intuitively comparing
in the first sample, the segmentation algorithm cannot find all the the degraded examples in Fig. 3 with those in Figs. 4 and 5,
bicycles and is not sure what is behind the bicycles. Undoubtedly, we can easily conclude that the coexistence of adherent mist
the uncertainty may influence the decisions involved in self- and raindrops tends to impair more information than purely
driving. Fortunately, bicycles could be counted correctly from atmospheric haze or only raindrops. The different quantitative
images obtained by the last four methods and the fence could be results of different tasks also validate this sense. As a result, a
roughly located in our restoration result. The second sample shows deep network capable of high performance is used in this work.
a semi-indoor environment. The segmentation borders are not Admittedly, although effective, the large amount of computa-
clear in the degraded image. Simultaneously, the last four methods tion required may limit real-time applications. A lightweight
could restore the image to the extent that the high-level algorithm architecture with comparable performance is expected in the
can approximately locate the edges of different surfaces. future.
Therefore, the results of the application experiment indicate 2) Prior-based methods (e.g., [1,16]) do not hold up well. Due to
that mist and raindrops adhering to camera lenses or windshields the interaction between adherent raindrops and adherent mist,
can severely hamper high-level computer applications. Moreover, the threshold-based and shape-based prior attention in [1,16]
solving this problem by creating new segmentation datasets is dif- do not work well in practical scenarios with both degradation
ficult because humans are also unable to accurately annotate sources. Our method specifically targets at this challenge and
objects in these degraded images. The proposed algorithm-based enhance the self-adaptive attention mechanisms by introducing
solution might help ease the problem and make the visual devices IPA blocks in addition to common attention layers.
more robust.
5. Conclusion
4.7. Discussion
In this work, we proposed an image degradation problem
The above experimental results indicate both the effectiveness related to the coexistence of adherent mist and raindrops. Both
of our method and the importance of the proposed problem. Unlike sources of interference with visibility are adherent to a camera lens
previous image restoration problems, the challenge of the new task or car windshield, showing different image characteristics com-
185
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
pared with atmospheric haze or rain streaks. Although the problem [3] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, M.-H. Yang, Gated fusion network
for single image dehazing, in: Proceedings of the IEEE Conference on Computer
is widely observed in real life, little works have been conducted on
Vision and Pattern Recognition, 2018, pp. 3253–3261.
it. To solve this problem, we adopted an attentive convolutional [4] S. Li, I.B. Araujo, W. Ren, Z. Wang, E.K. Tokuda, R.H. Junior, R. Cesar-Junior, J.
network containing interpolation-based pyramid attention blocks Zhang, X. Guo, X. Cao, Single image deraining: A comprehensive benchmark
to strengthen the spatial perception of regional obstacles. Both analysis, in: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2019, pp. 3838–3847.
quantitative evaluation and intuitive examples showed quality [5] X. Qin, Z. Wang, Y. Bai, X. Xie, H. Jia, Ffa-net: Feature fusion attention network
improvement with the proposed method without explicit priors. for single image dehazing, in: Proceedings of the AAAI Conference on Artificial
The superiority of the proposed architecture was also validated Intelligence, Vol. 34, 2020, pp. 11908–11915.
[6] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the
even for the tasks of conventional de-hazing and the removal of IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–
purely raindrops. Restoration from the adherent mist and raindrop 7141.
degradation can benefit high-level tasks, such as panoptic segmen- [7] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very
deep residual channel attention networks, in: Proceedings of the European
tation, by producing reliable results, which demonstrates the sig- conference on computer vision (ECCV), 2018, pp. 286–301.
nificance of the proposed architecture in vision applications. In [8] Y. Zhang, K. Li, B. Zhong, Y. Fu, Residual non-local attention networks for image
the future, we will further simplify the architecture to reduce the restoration, in: International Conference on Learning Representations, 2019.
[9] F. Yang, J. Ren, Z. Lu, J. Zhang, Q. Zhang, Rain-component-aware capsule-gan for
computation complexity while keeping similar performance, thus single image de-raining, Pattern Recogn. 108377 (2021).
the real-time restoration could be achieved in more embedded sce- [10] X. Jin, Z. Chen, W. Li, Ai-gan: Asynchronous interactive generative adversarial
nes to handle the proposed adherent mist and raindrop degrada- network for single image rain removal, Pattern Recogn. 100 (2020) 107143.
[11] H. Kurihata, T. Takahashi, I. Ide, Y. Mekada, H. Murase, Y. Tamatsu, T. Miyahara,
tions. In addition, our experimentally acquired dataset and our
Rainy weather recognition from in-vehicle camera images for driver
codes will be released online (link) to facilitate further studies on assistance, IEEE Proceedings. Intelligent Vehicles Symposium 2005 (2005)
related tasks. 205–210.
[12] K. Ito, K. Noro, T. Aoki, An adherent raindrop detection method using mser, in:
Asia-Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA), 2015, pp. 105–109.
6. CRediT authorship contribution statement
[13] S. You, R.T. Tan, R. Kawakami, Y. Mukaigawa, K. Ikeuchi, Adherent raindrop
modeling, detectionand removal in video, IEEE Trans. Pattern Anal. Mach.
Da He: Conceptualization, Methodology, Software, Validation, Intell. 38 (9) (2015) 1721–1733.
Writing - Original Draft. Xiaoyu Shang: Investigation, Data Cura- [14] D. Eigen, D. Krishnan, R. Fergus, Restoring an image taken through a window
covered with dirt or rain, in: Proceedings of the IEEE International Conference
tion. Jiajia Luo: Conceptualization, Methodology, Writing - Review on Computer Vision, 2013, pp. 633–640.
& Editing, Supervision. [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
Courville, Y. Bengio, Generative adversarial nets, Advances in Neural
Information Processing Systems (2014) 2672–2680.
CRediT authorship contribution statement [16] Y. Quan, S. Deng, Y. Chen, H. Ji, Deep learning for seeing through window with
raindrops, in: Proceedings of the IEEE International Conference on Computer
Vision, 2019, pp. 2463–2471.
Da He: Conceptualization, Methodology, Software, Validation, [17] M. Shao, L. Li, H. Wang, D. Meng, Selective generative adversarial network for
Writing - original draft. Xiaoyu Shang: Investigation, Data cura- raindrop removal from a single image, Neurocomputing 426 (2021) 265–273.
[18] B. Cai, X. Xu, K. Jia, C. Qing, D. Tao, Dehazenet: An end-to-end system for single
tion. Jiajia Luo: Conceptualization, Methodology, Writing - review image haze removal, IEEE Trans. Image Process. 25 (11) (2016) 5187–5198.
& editing, Supervision. [19] R. Fattal, Single image dehazing, ACM Trans. Graph. (TOG) 27 (3) (2008) 1–9.
[20] K. He, J. Sun, X. Tang, Single image haze removal using dark channel prior, IEEE
Trans. Pattern Anal. Mach. Intell. 33 (12) (2010) 2341–2353.
Data availability [21] F. Yuan, Y. Zhou, X. Xia, X. Qian, J. Huang, A confidence prior for image
dehazing, Pattern Recogn. 119 (2021) 108076.
[22] Z.-L. Ni, G.-B. Bian, G.-A. Wang, X.-H. Zhou, Z.-G. Hou, H.-B. Chen, X.-L. Xie,
We have shared the link to the data/code in the section of Pyramid attention aggregation network for semantic segmentation of surgical
Conclusion. instruments, in: Proceedings of the AAAI Conference on Artificial Intelligence,
Vol. 34, 2020, pp. 11782–11790.
[23] H. Li, P. Xiong, J. An, L. Wang, Pyramid attention network for semantic
Declaration of Competing Interest segmentation, in: Proceedings of the British Machine Vision Conference, 2018.
[24] Z. Huang, Z. Zhong, L. Sun, Q. Huo, Mask r-cnn with pyramid attention network
for scene text detection, in: 2019 IEEE Winter Conference on Applications of
The authors declare that they have no known competing finan- Computer Vision (WACV) IEEE, 2019, pp. 764–772.
cial interests or personal relationships that could have appeared [25] H. Wang, G. Wang, Z. Sheng, S. Zhang, Automated segmentation of skin lesion
based on pyramid attention network, in: International Workshop on Machine
to influence the work reported in this paper.
Learning in Medical Imaging, Springer, 2019, pp. 435–443.
[26] Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T.S. Huang, H. Shi, Pyramid
attention networks for image restoration, arXiv preprint arXiv:2004.13824
Acknowledgements (2020).
[27] J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and
This work was supported by National Key R&D Program of super-resolution, European Conference on Computer Vision (2016) 694–711.
[28] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A.
China (Grant No. 2021YFF0502900), National Natural Science Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution
Foundation of China (Grant No. 31870942); Peking University Clin- using a generative adversarial network, in: Proceedings of the IEEE Conference
ical Medicine Plus X – Young Scholars Project (Grant No. on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
[29] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment:
PKU2020LCXQ017, PKU2021LCXQ028); the Fundamental Research
from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4)
Funds for the Central Universities (Grant No. BMU2021YJ009); and (2004) 600–612.
PKU-Baidu Fund (Grant No. 2020BD039). [30] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale
hierarchical image database, IEEE Conference on Computer Vision and Pattern
Recognition (2009) 248–255.
References [31] B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, Aod-net: All-in-one dehazing network, in:
Proceedings of the IEEE international conference on computer vision, 2017, pp.
4770–4778.
[1] R. Qian, R.T. Tan, W. Yang, J. Su, J. Liu, Attentive generative adversarial network
[32] P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with
for raindrop removal from a single image, in: Proceedings of the IEEE
conditional adversarial networks, in: Proceedings of the IEEE conference on
Conference on Computer Vision and Pattern Recognition, 2018, pp. 2482–
computer vision and pattern recognition, 2017, pp. 1125–1134.
2491.
[33] H. Zhang, V.M. Patel, Densely connected pyramid dehazing network, in:
[2] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, G. Hua, Gated context
Proceedings of the IEEE conference on computer vision and pattern
aggregation network for image dehazing and deraining, in: IEEE Winter
recognition, 2018, pp. 3194–3203.
Conference on Applications of Computer Vision (WACV), 2019, pp. 1375–1383.
186
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187
[34] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollár, Panoptic segmentation, in: Jiajia Luo received his Ph.D. degree in mechanical
Proceedings of the IEEE Conference on Computer Vision and Pattern engineering from the University of Michigan, Ann Arbor,
Recognition, 2019, pp. 9404–9413. USA. He is currently an Assistant Professor at the
[35] Z. Cai, N. Vasconcelos, Cascade r-cnn: High quality object detection and Biomedical Engineering Department, Peking University,
instance segmentation, IEEE Trans. Pattern Anal. Mach. Intell. (2019). Beijing, China. His research interests include biomedical
imaging, biomechanics, and machine learning. He was a
recipient of a career development award in Michigan
Da He received his B.S. degree in Optoelectronic Infor- Institute for Clinical & Health Research, and Peking
mation Science and Engineering from Nankai Univer- University Clinical Medicine + X Young Investigator
sity. In 2018, he joined the University of Michigan- Grant Award. His research is supported by National Key
Shanghai Jiao Tong University Joint Institute, Shanghai R&D Program of China, National Natural Science Foun-
Jiao Tong University, Shanghai, China, as a graduate dation of China, and PKU-Baidu Fund.
student. He is interested in image processing and deep
learning.
187