You are on page 1of 10

Neurocomputing 505 (2022) 178–187

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Adherent mist and raindrop removal from a single image using attentive
convolutional network
Da He a, Xiaoyu Shang a, Jiajia Luo b,c,⇑
a
University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China
b
Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, China
c
Biomedical Engineering Department, Peking University, Beijing 100191, China

a r t i c l e i n f o a b s t r a c t

Article history: Temperature-difference-induced mist and environmentally-induced raindrops adhering to glass products
Received 25 November 2021 such as windshield and camera lens, can often block vision and severely degrade the image seen through
Revised 27 June 2022 the lens. Despite posing a challenge to various vision systems, including autonomous driving and security
Accepted 12 July 2022
surveillance, this problem has not received sufficient attention from researchers. In this work, we discuss
Available online 16 July 2022
the image degradation caused by adherent mist and raindrops. An attentive convolutional network is
designed to visually remove the adherent mist and raindrops from a single image. Considering the vari-
Keywords:
ations in coexistence and reginal characteristics of adherent mist and raindrops, we propose
Adherent mist removal
Raindrop removal
interpolation-based pyramid attention blocks to perceive spatial information at different scales without
Image restoration rigid priors. Experiments show that the proposed method can improve the visibility of severely degraded
Attention mechanism images in real-world scenarios, both qualitatively and quantitatively. Further application experiments
Convolutional neural network demonstrate that this practical problem is critical to high-level vision situations.
Ó 2022 Elsevier B.V. All rights reserved.

1. Introduction Temperature difference is one of the main causes of adherent


mist. Adherent mist often forms in conjunction with raindrops,
High-level computer vision tasks, such as object detection and especially when there is a marked temperature difference between
segmentation, are highly dependent on the quality of captured the interior of the car and the windshield or windows cooled by the
images. However, outdoor vision systems such as security moni- raindrops outside. Moreover, dust on the glass surface and the
toring and vision-based self-driving cars or driving assistance sys- impurities in water may contribute to the formation of mist-like
tems are easily influenced by severe weather or other conditions. stains after the waterdrops evaporate. In addition to the fact that
As shown in Fig. 1, raindrops adhering to a camera lens can signif- the interference with visibility caused by the coexistence of adher-
icantly degrade the image, and mist in the car adhering to a wind- ent mist and raindrops will often be experienced by cars or buses
shield can reduce visibility. on rainly days or in cold areas, the degradation is also obvious in
Adherent raindrops (or waterdrops) exist in a wide range of sce- monitoring datasets (such as those from The Nature Conservancy).
narios. Outdoors, rain streaks will result in raindrops. On a fishing Therefore, it is necessary to handle both adherent mist and rain-
boat, splashing water can form waterdrops. Because the visual drops simultaneously. Fig. 1(c) shows the overlap of mist and rain-
effects from raindrops vary due to factors such as the different sizes drops adhering to a camera lens.
of the drops, their distance to the camera, and complex refraction, it By improving the formulas in [1–4], the degraded image Icap
is extremely difficult to model the raindrops manually. captured by the camera can be mathematically described as a com-
Another severe and often-observed source of interference with bination of raindrops R, the scattering of adherent mist A, and the
visibility, adherent mist, is often overlooked in computer vision clean background B as follows:
research. As shown in Fig. 1(b), mist adhering to a windshield
can severely obstruct a driver’s vision. The scene in Fig. 1(b) is Icap ¼ ðð1  MÞ  B þ RÞ  t þ A  ð1  tÞ; ð1Þ
often encountered when driving a car in cold weather.
where M denotes a binary mask showing the existence of raindrops,
 indicates element-wise multiplication, and t is the transmission
⇑ Corresponding author at: Biomedical Engineering Department, Peking Univer-
sity, Beijing 100191, China.
map indicating information passing ratios through the adherent
E-mail addresses: da.he@sjtu.edu.cn (D. He), shangxiaoyu@sjtu.edu.cn mist. Icap ; M; B; R; t, and A are location-related maps (i.e., matrices)
(X. Shang), jiajia.luo@pku.edu.cn (J. Luo). instead of constants.

https://doi.org/10.1016/j.neucom.2022.07.032
0925-2312/Ó 2022 Elsevier B.V. All rights reserved.
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

Fig. 1. Examples of adherent raindrops and/or mist. (a) Raindrops adhering to a glass before the camera [1]; (b) Inhomogeneous mist adhering to a windshield when people
sitting in the car; (c) Mist and raindrops simultaneously adhering to a camera lens.

Although disturbances from adherent mist and raindrops are proposed IPA contributes to handling images degraded by
widely observed in many scenarios, and visibility can be signifi- regional visibility interferences such as haze, adherent mist,
cantly deteriorated for algorithms as well as individuals, limited and raindrops. As a result, our network shows high restoration
researches on these natural phenomena have been conducted in performance for the proposed task, as well as the tasks of
the field of computer vision. Particularly, there seems to be no removing conventional haze and only raindrops.
studies on adherent mist.
Admittedly, both adherent mist and raindrops on windows can 2. Related work
be cleaned off by defrosters and windshield wipers. However,
algorithm-based solutions and attention to the proposed degrada- 2.1. Attention mechanism
tion are still in demand in many real-world scenarios. For example,
self-driving system should be able to deal with the degradation in Instead of treating all features equally, which is inefficient or
visibility whether or not it can switch on the defrosters and wipers even harmful to the convergence of deep neural networks, visual
in a timely fashion. The car’s drive recorder app is expected to work attention mechanism usually assigns different attention weights
when other hardware stops. In addition, low-cost vision devices to different features. Important features for specific tasks desire
like monitoring cameras do not have defrosters or wipers. strong attention. Common attention mechanisms for CNNs can
In this study, we aim to remove the adherent mist and rain- be divided into channel-wise attention and spatial attention. The
drops from a single impaired image to obtain a clean image. Based former has achieved great success in image classification [6] and
on convolutional neural networks (CNNs), the proposed approach was then introduced to image restoration tasks [7]. Based on the
can automatically visually remove raindrops and mist to varying experiments for de-noising, de-mosaicing, artifacts removing and
degrees. It can also improve the effectiveness of high-level vision image super-resolution, Zhang et al. demonstrated the importance
algorithms in severe weather. of both attention mechanisms for a wide range of low-level vision
Because raindrops and mist can partially or even completely tasks [8].
block some objects, it is challenging to reproduce a purely clean
image. To solve the problem, we apply the basic blocks in [5] and 2.2. Rain streak removal
advanced attention mechanisms to construct our network. In addi-
tion to the typical convolutional layers, local residual learning, Most visibility enhancement algorithms related to rain weather
channel-wise attention, and spatial attention are applied to extract focused on rain streaks [9]. However, the shapes and physical
features. Long-distance shortcuts are also used, forming different effects of raindrops differ greatly from those of rain streaks. There-
feature stages from corresponding layer groups before feature fore, these streak removal articles may not be directly comparable
fusion. To strengthen the spatial attention facing the variations of to problems involving the removal of raindrops [10].
adherent mist and raindrops, an interpolation-based pyramid
attention (IPA) block is applied to every group. The IPA block first 2.3. Raindrop removal
zooms the features to various sizes then perceives spatial attention
from these scaled spaces. Several studies on raindrop detection have been undertaken.
For our experiment, we used a dataset containing degraded and Kurihata et al. [11] utilize principal component analysis to learn
clean image pairs. Unlike in purely synthetic datasets, the adherent the characteristics of raindrops. Ito et al. [12] use the maximally
mist and raindrops in our dataset physically exist and could there- stable extremal regions to detect raindrop candidates. You et al.
fore model real-world scenarios. The proposed method can [13] pay attention to both raindrop detection and removal in
improve the visibility of degraded images and outperforms other videos. They notice that the temporal change of intensity of rain-
advanced networks. Moreover, it still exhibits the state-of-the-art drop pixels is different from that of non-raindrop pixels, which
performance when applied to the tasks of removing traditional does not apply to the single image case. Eigen et al. [14] may have
atmospheric haze or purely raindrops. been the first to address the problem of single image raindrop
In summary, our main contributions include the following: removal by training a shallow CNN with paired data. Unfortu-
nately, the limited capacity of the CNN model restricts its perfor-
1) We address the rarely studied visual interference problem mance, especially for large and dense raindrops.
caused by adherent mist and raindrops. Without appropriate Qian et al.’s work (DeRaindrop) [1] might have been the first
solutions, this problem limits the extensive use of systems practical solution for the problme of raindrop removal. They build
based on outdoor cameras. a generative adversarial network (GAN) [15] to generate raindrop-
2) We use an experimental-based dataset that contains 1560 free images. Besides the GAN, DeRaindrop includes a recurrent net-
image pairs, focusing on the coexistence of adherent mist and work to gradually locate raindrops in the input image as a spatial
raindrops. attention. The performance is related to this raindrop mask, and
3) An end-to-end restoration network is applied to remove its ground truth is calculated by the difference between the
adherent mist and raindrops without hand-crafted priors. The degraded image and the clean image with a hand-crafted threshold
179
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

of 30. However, it is difficult to apply this manual threshold to network backbone, which is denoted as the baseline. The baseline
images with adherent mist, which contain complex pixel-value contains 114 basic blocks, uniformly distributed in six groups.
variations. Quan et al. [16] propose shape-driven attention to the There is a long-distance shortcut in each group, which forms the
raindrop region, which avoids the rigid threshold but introduces residual learning mechanism for every 19 basic blocks. The output
another strong prior about raindrops’ shapes. This prior is also feature maps of each group are then fused. A global residual short-
inapplicable to raindrops with adherent mist, as the edges in the cut is applied so that the entire network learns the transformation
image may be more severely damaged. By contrast, methods with- residuals.
out hard priors might be more useful for comprehensive real-world Due to the characteristics of adherent raindrops, Qian et al. [1]
situations [17]. and Quan et al. [16] designs prior-based attention mechanisms to
enhance the attention performance for raindrop regions. Combined
2.4. De-hazing with adherent mist, the additional attention becomes more impor-
tant, while those priors hardly hold. Therefore, to further improve
Few studies have primarily focused on adherent mist. Some the perception for adherent mist and raindrops, we propose and
similar studies mainly concentrate on atmospheric haze (or fog) insert an IPA block into each group in the baseline model to imple-
[2,3,18–20]. They both have the scattering effect, which results in ment the final network (denoted as baseline + IPA), as shown in
a ‘‘white” color and degrades the original information. However, Fig. 2(b).
compared to atmospheric haze, adherent mist is more likely to
be inhomogeneous as it is related to the specific design of the 3.2. Interpolation-based pyramid attention
equipment, such as which direction the warm steam comes from
and how the window looks (e.g., mist adhering to the windshield Adherent mist and raindrops display patterns that are very dif-
near the driver’s seat in Fig. 1(b)). ferent from common human-made objects (such as buildings and
Single image de-hazing can be explored using some priors-based chairs) and widely-studied noises (such as Gaussian noise and rain
methods. He et al. [20] propose a dark channel prior to discuss the streaks). Artificial objects usually have rigid shapes. The convolu-
local minimum of the dark channel that varies between haze and tional kernels are sensitive to their low-level edges and high-
haze-free images. Yuan et al. [21] unify several existing priors and level shapes. Conventional noises are always discretely distributed
design a confidence prior to better handle outliers and noises. in the image, and most background information remains. However,
Moreover, some learning-based methods have been proposed. adherent mist has non-rigid edges and attenuates the sharpness of
Cai et al. [18] introduce an end-to-end CNN to estimate the trans- adherent raindrops in images. The adherent raindrops, in turn,
missions from images with haze. Ren et al. [3] propose a multi- aggravate the visual inhomogeneity. Complex refraction and scat-
scale gated fusion network to improve performance. The smoothed tering occurs when adherent mist and raindrops coexist, causing
dilation technique is adopted by Chen et al. [2] to remove gridding much more severe visual degradation than general atmospheric
artifacts. Qin et al. [5] focus on the utilization of feature fusion and haze and purely raindrops.
achieve better de-hazing performance than previous and even Therefore, additional techniques to help the CNN extract fea-
some newer methods with the notable margin. tures from the non-rigid and severe degradation would be useful.
However, in our work, the superposition of adherent mist and Unlike other pyramid attention mechanisms that fuse different lay-
raindrops results in complex local features, limiting general de- ers [22] or adopt different convolutional kernels for the same layer
hazing algorithms using transmission estimation. It is also difficult [23–25], our IPA generates different scaling features by bilinear
to manually input an attention map similar to those in [1,16]. interpolations, as shown in Fig. 2(d). Specifically, for feature maps
Therefore, the proposed network without explicit priors is from the same layer, IPA first amplifies or shrinks them by interpo-
designed to effectively utilize features and restore images for the lation. Two convolutional layers are successively applied at each
real-world scenarios. scale to obtain three attention maps with different scales. Finally,
attention maps are interpolated back to the original lateral size
3. Method and added together. The final spatial attention map is then adopted
to weight the original feature map by pixel-wise multiplication.
3.1. Network architecture Like other pyramid attention techniques, IPA learns spatial
attention based on features with different scales, enabling large
The pipeline of our method is shown in Fig. 2. We organize the receptive fields and flexible perception for patterns of various
basic blocks and feature attention blocks from [5] to build our sizes. Moreover, the interpolation operation outperforms

Fig. 2. Pipeline of the proposed method. (a) Diagram of the dataset acquisition; (b) Overview of the applied convolutional network; (c) Some legends; (d) The transformation
of feature maps in the IPA block.

180
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

convolution-based upsampling/downsampling (e.g., [26,22]) for 4. Experiment


another advantage in this task. Because the bilinear interpolation
can be regarded as the low-pass filter when processing images, 4.1. Experiment setup
high-frequency information in the feature map is suppressed after
interpolation. In the original feature map, the background textures The proposed method is implemented using the PyTorch frame-
in clean regions usually correspond to high-frequency information. work and trained with an Nvidia Titan RTX GPU. We use Adam
In contrast, textures obscured by adherent mist and raindrops optimizer with a learning rate of 1  104 . The patch-based train-
mainly contain relatively low-frequency signals. Therefore, after ing strategy is applied with the cropped patch size of 240  240
interpolation (i.e., the low-pass filter), the attention mechanism and the batch size of 2. Other data augmentation operations
is less likely to pay meaningless attention to clean regions, and including flipping and rotation are also used. The coefficients
areas covered by adherent mist and raindrops could possibly k1 ; k2 in the loss function are 1:00 and 0:04, respectively.
obtain more attention than before. The conclusion is that the IPA Patch-based training is often used to improve training perfor-
block may help to strengthen the attention mechanism for regional mance, while too small patch size might damage the extraction
visibility interference problems. of degradation features. The loss coefficents are used to weight
and balance different loss terms. Too large perceptual loss might
impair the convergence in the early stages of training. As a result,
3.3. Loss function the selection of the above hyper-parameters is generally
experiment-based. We have conducted a parameter searching to
As indicated in [27,28], although pixel-wise mean squared error obtain the listed learning rate, patch size and loss coefficients.
(MSE) or mean absolute error (MAE) may bring significant
improvements to metrics such as peak signal-to-noise (PSNR)
4.2. Restoration results
and structural similarity index (SSIM) [29], the generated image
is likely to be over-smoothed. Instead, a perceptual loss term can
To our knowledge, there have been no other comparable studies
be applied to calculate the error in high-level views, which might
on the problem of jointly handling adherent mist and raindrops.
help generate subjectively realistic images.
Therefore, we compare the proposed method with the following
We calculate the perceptual loss based on the MSE of the output
three compromised aspects.
features from the third block of VGG16 [28]. The VGG16 model is
pretrained on the ImageNet dataset [30], which may provide a rel-
1) Our method is compared with two two-step combinations. A
atively high-level description of an image. Finally, the perceptual
two-step combination indicates we successively adopted a pre-
loss is added together with the general MAE loss:
trained de-hazing approach (that originally focuses on atmo-
spheric haze) and a pre-trained raindrop removal model to
Ltotal ¼ k1 Lmae þ k2 Lper ; ð2Þ
restore the image. In other words, the degraded image is first
processed by a de-hazing model (i.e., GCANet by Chen et al.
where k1 ; k2 are the weight coefficients of items, Ltotal is the total
[2], or FFANet by Qin et al. [5]) and then by a raindrop removal
loss, Lper denotes the perceptual loss, and Lmae is calculated by
model (i.e., Qian et al. [1] or Quan et al. [16]).
directly comparing the prediction and the ground truth using MAE.
2) We also tried to apply raindrop removal models before GCA-
Net (or FFANet) for comparison.
3) Although originally meant for other tasks, the methods from
3.4. Data collection
DeRaindrop [1], GCANet [2], FFANet [5] and Quan et al. [16] are
advanced restoration networks that can technically be re-
To collect the image pairs consisting of clean and degraded
trained on our dataset, achieving the joint removal of adherent
images, we take photographs following strategies inspired by [1].
mist and raindrops to some extent.
The schematic is simplified in Fig. 2(a). Specifically, we utilize
two of the same glass panels to simulate different cases. First, we
The results shown in Table 1 indicate that, first of all, it is
use a tripod to secure the camera. Second, one of the panels is ran-
important to handle adherent mist and raindrops together. The
domly sprinkled with waterdrops on one side and randomly
coexistence of mist and raindrops may lead to results that are even
sprayed with mist from a cosmetic sprayer on the other side. Then,
we place the blurred panel in front of the camera and take a photo
of the degraded image. Finally, the corresponding ground truth Table 1
(i.e., clean) image is taken with the other clean glass panel. Quantitative comparisons of different methods on our dataset for adherent mist and
To maintain good generalization, the distributions of water- raindrops removal tasks.

drops and mist are random. Moreover, the distance between the Method PSNR SSIM
camera and the glass panel is also a random value from 0.5 cm Degraded Image 17.84 0.6048
to 4.0 cm. Thus the shape of waterdrops could appear differently
Chen et al. [2] + Qian et al. [1] 15.12 0.5919
in the image due to camera focal length. A Canon EOS 800D camera Qin et al. [5] + Qian et al. [1] 17.57 0.6302
and the integrated camera of the Redmi K20 Pro samrtphone are Chen et al. [2] + Quan et al. [16] 14.75 0.5670
used for taking pictures. Qin et al. [5] + Quan et al. [16] 17.30 0.6100
We alleviate two dataset issues in [1] during our preparation: Qian et al. [1] + Chen et al. [2] 16.30 0.6167
the observation of unexpected reflection images on the glass Qian et al. [1] + Qin et al. [5] 18.83 0.6383
plates, and the fact that many samples in [1] were acquired on Quan et al. [16] + Chen et al. [2] 15.96 0.5817
Quan et al. [16] + Qin et al. [5] 18.48 0.6114
sunny days.
Finally, all the collected images are resized with bicubic inter- Re-trained Qian et al. [1] 18.89 0.7012
Re-trained Chen et al. [2] 22.88 0.7968
polation to the shape of 640  480 to form the dataset. There are
Re-trained Qin et al. [5] 23.45 0.8056
1248 image pairs for training, 156 pairs for validation, and 156 Re-trained Quan et al. [16] 24.12 0.8089
pairs for testing. All the following evaluation results are based on
Ours 24.66 0.8293
images in the testing set.
181
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

worse than the degraded images (e.g., Chen et al. + Qian et al. and relatively severe degradation (e.g., the other three samples in
results in Table 1). The four re-trained models generally show Fig. 3). The fact that the color of the cement floor in the fourth sam-
effectiveness, especially for re-trained Chen et al., re-trained Qin ple in Fig. 3 is a similar color to the mist does not affect our meth-
et al. and re-trained Quan et al., but our method shows the best od’s performance.
performance. Therefore, the severe degradation of visibility from adherent
In realistic scenes, the combination of adherent mist and rain- mist and raindrops is observed. Unlike the well-studied restoration
drops might severely impair the image, blocking many objects problems of de-noising or motion blur removal, the new task above
from the subjective view. Examples in Fig. 3 exhibit the poor shoot- is much more complicated, and it is hard to estimate the degrada-
ing environments and the restoration results from several of the tion kernel. Simply re-training restoration networks leaves many
above methods. Due to space limitations, we present only five rel- artifacts. Our method outperforms the compared advanced net-
atively high-performance methods in Fig. 3 as the compared works, quantitatively as well as intuitively.
methods.
As shown in Fig. 3, some objects are not distinguishable in the 4.3. Extended experiments
degraded image. For example, in the first row of Fig. 3, the decora-
tive text on the pink wall is invisible in the degraded image. By Besides the newly proposed vision task described above, we
contrast, most words can be distinguished in the restoration result also extend our network to other two conventional problems to
from our method. A similar phenomenon is shown in the second show its broad applicability. Specifically, the proposed network is
row. The key piece of information in the captured image, the house re-trained for atmospheric de-hazing and purely adherent raindrop
number, is recovered by our method. For the third sample, our removal.
result’s general color is intuitively closest to that in the ground
truth. The example in the fourth row can be regarded as a traffic 4.3.1. De-hazing
control application. The camera in the parking lot is severely influ- Without any modification to the architecture, we also train our
enced by the poor weather conditions and takes degraded photos network using the RESIDE dataset, a widely used de-hazing bench-
as shown in the picture. In this case, high-level algorithms may mark. Following the same operations as previous studies, we use
not recognize the plate number and guards cannot easily confirm the indoor set containing over 13 K image pairs for training and
the bus position. However, our method can recover the plate num- the indoor synthetic objective testing set (SOTS), including 500
bers on the buses and determine the condition of the parking lot at pairs for testing. The testing results are listed in Table 2. For fair-
a glance. ness and convenience, the performance values of other methods
The comparison reveals that the proposed method can obtain are cited from [5]. In addition, three samples from the testing set
relatively cleaner images and fewer artifacts than the other meth- are shown in Fig. 4 for qualitative comparison. Only our method
ods; this is also proved through PSNR/SSIM values. In addition, the and the two relatively new advanced approaches, Chen et al. [2]
combination method (i.e., Qian et al. [1] + Qin et al. [5]) in Fig. 3 and Qin et al. [5], are presented.
shows a tiny improvement. Our method also displayed robust per- As shown in Fig. 4, both the previous works and our method
formance for various conditions, and is able to handle relatively achieve nearly perfect performance. However, as shown in Table 2,
weak interference with visibility (e.g., the third sample in Fig. 3) the proposed method outperforms other methods in the PSNR and

Fig. 3. Intuitive comparisons among a few restoration methods. From left to right: ground truth, degraded image, Qian et al. + Qin et al. [1,5], re-trained Chen et al. [2], re-
trained Qin et al. [5], re-trained Quan et al. [16] and our method. Each row corresponds to the same sample. The values below images (except ground truth images) indicate
the corresponding values of PSNR / SSIM. The details of the images are best viewed by zooming in.

182
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

Table 2 generating attention masks using the shape-driven method [16]


Quantitative comparisons of methods for de-hazing. is also compared. Similarly, the metric values of compared meth-
Method PSNR SSIM ods are directly cited from [1,16]. Three examples from the testing
He et al. [20] 16.62 0.8179 set are used for qualitative comparison in Fig. 5.
Li et al. [31] 19.06 0.8504 As shown in Table 3, our method achieves comparable results
Cai et al. [18] 21.14 0.8472 with Qian et al. [1] and Quan et al. [16] using the same test set.
Ren et al. [3] 22.30 0.8800 Although our PSNR value is slightly lower than that of [1] or
Chen et al. [2] 30.23 0.9800
Qin et al. [5] 36.39 0.9886
[16], our method has the highest SSIM value.
A better qualitative performance of our method can be seen in
Ours 40.02 0.9931
Fig. 5. By comparing the walls of the first sample in Fig. 5, it is easy
to see that some light ‘‘shades” remain in Qian et al.’s result, which
correspond to the original raindrops’ locations. In contrast, our
method does a better job in terms of color consistency. However,
based on the remaining two samples in Fig. 5, artifacts are still evi-
dent in the zoomed subplots when using methods from [1,16],
while our method also achieves reasonable results in these regions.
The two above issues demonstrate two advantages of our
method, which avoids explicit priors in [1,16]. The work of Qian
et al. [1] sets a threshold of 30 to annotate the raindrop mask for
training, which is a hand-crafted prior. This prior is not suitable
for scenes with monochromatic backgrounds on cloudy or rainy
days. In these scenes, the pixel difference between raindrop
regions and clean regions may become so small that the prior does
not work, and the performance of [1] decreases (see the three sam-
ples in Fig. 5). [16] applies a complex shape-driven prior to gener-
ate the raindrop mask. However, even without adherent mist,
raindrops are still very difficult to model. For image regions with
dramatic gradient changes (e.g., the zoom-in areas of the second
and third samples in Fig. 5), [16] may result in severe artifacts.
Therefore, with similar quantitative performance, our method
without hand-crafted priors is hopefully more effective for practi-
cal scenes.

4.3.3. Summary of the extended experiments


Although the primary purpose of this study is to call for atten-
tion on the newly proposed problem of adherent mist and raindrop
removal, the proposed method works well not only on our own
Fig. 4. Qualitative comparisons for three de-hazing methods. From top to bottom:
ground truth, hazy image, Chen et al. [2], Qin et al. [5], and our method. Each
dataset containing adherent mist, but also for conventional de-
column corresponds to the same sample. hazing and purely raindrop removal problems. In other words,
these tasks can be handled in applications using only one method.

SSIM metrics. Therefore, the proposed method could be applied to 4.4. Ablation study
handle the conventional de-hazing problem with appropriate
training. Our network’s architecture can be divided into two parts: the
baseline and the IPA blocks. Therefore, we investigate the influence
4.3.2. Sole raindrop removal of the existence of IPA blocks concerning the above problems. In
DeRaindrop (Qian et al. [1]) is a state-of-the-art approach for addition, we compare our IPA with a convolution-based pyramid
handling adherent raindrops. A dataset with purely raindrop attention (CPA) block and a pyramid pooling (PP) feature fusion
degraded image and clean image pairs was also published by Qian block [33]. CPA is implemented by simply removing interpolations
et al. [1]. Therefore, we also try our method using that dataset to from IPA and modifying convolution parameters for rescaling.
verify our approach’s feasibility when only handling adherent rain- Moreover, baseline + PP is achieved by replacing the entire IPA
drops. The dataset contains 861 image pairs for training and 58 block with the pyramid pooling in Zhang et al. [33] and slightly
image pairs for quantitative evaluation. We still do not modify modifying the number of channels. The results of the ablation
the architecture of our method. The quantitative performance is study are shown in Table 4.
shown in Table 3. In addition to [1], a state-of-the-art method for As shown in Table 4, the proposed IPA block outperforms the
two existing pyramid architectures in the three restoration tasks,
which is reasonable because the IPA block can combine multi-
Table 3
Quantitative comparisons of methods on the test set in [1] for solely adherent
scale learning and the low-frequency attention mechanism. Table 4
raindrop removal. also provides quantitative evidence of the contribution of the IPA
block to the performance improvement based on the baseline
Method PSNR SSIM
model in the three experiments. Moreover, although higher perfor-
Eigen et al. [14] 28.59 0.6726 mance can be obtained, IPA does not add much computational
Isola et al. [32] 30.14 0.8299
Qian et al. [1] 31.57 0.9023
complexity as six IPA blocks only add 0.01 M parameters (0.11%)
Quan et al. [16] 31.44 0.9263 and 0.97G FLOPs (0.19%) to the model. By contrast, PP adds
Ours 31.33 0.9297
0.24 M parameters (2.71%) and 13.56G FLOPs (2.69%) based on
the baseline but with less improvement.
183
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

Fig. 5. Qualitative comparisons for three raindrop removal methods. From top to bottom: ground truth, image with raindrops, Qian et al. [1], Quan et al. [16], and our method.
Each column corresponds to the same sample. Artifacts are best compared by zooming in.

Table 4
Quantitative results and complexity analysis of the ablation study for three tasks.

Methods Proposed task De-hazing Raindrop removal # params FLOPs


PSNR SSIM PSNR SSIM PSNR SSIM
Baseline 24.13 0.8142 36.54 0.9895 30.82 0.9247 8.87 M 504.04G
Baseline + CPA 24.23 0.8226 34.62 0.9849 31.04 0.9238 8.93 M 504.66G
Baseline + PP 24.28 0.8250 35.58 0.9866 31.07 0.9216 9.11 M 517.60G
Baseline + IPA 24.66 0.8293 40.02 0.9931 31.33 0.9297 8.88 M 505.01G

4.5. Visualization of the attention map both CPA and our IPA indicate the left half of the image is more
severely degraded than the right half. However, CPA can not distin-
To better demonstrate the difference between our IPA with guish between degraded regions with or without raindrops. In
other attention mechanisms, we visualize the attention maps of other words, the small region with mist only and the small region
three attention mechanisms as follows. Specifically, in addition with overlapping mist and raindrops obtain similar attention
to our IPA, we obtained the ideal attention map (ground truth of intensity by CPA. In contrast, the attention map by IPA can well dis-
the predicted attention map) in [1] with a threshold of 30, which tinguish three different kinds of regions: low intensity for clean
is specially designed for raindrop removal tasks. We also analyze regions, medium intensity for regions degraded by mist only, and
the convolution-based pyramid attention, denoted as CPA before, high intensity for regions degraded more severely. Thus, the above
which may be of general interest for all restoration tasks. Two visu- observations demonstrate the superiority of the proposed IPA for
alized examples can be found in Fig. 6. the proposed task.
As shown in Fig. 6, the ideal attention maps of Qian et al. [1] are
clean and can capture some large raindrop regions. However, such 4.6. Application experiment
a hard hand-crafted prior attention shows low performance for
regions with monochromic backgrounds. According to the first In addition to the above quantitative and qualitative evaluation
example, we can easily find that raindrops in the ‘‘road” and ‘‘lawn” results, the effectiveness of our method is also validated in practi-
regions are not distinguished by the attention map of [1]. In con- cal applications. Panoptic segmentation is a newly proposed seg-
trast, the other two adaptive attention maps do not show this defi- mentation task [34] that tries to label all the image pixels. It is
ciency. This phenomenon validates our discussion in the extended an essential high-level task in many areas, including autonomous
experiments. cars. The segmentation results based on ground truth images,
Although the attention maps from CPA and our IPA both focus degraded images, and restoration images are shown in Fig. 7.
on severely degraded regions with various backgrounds, our Specifically, we applied the panoptic segmentation algorithm,
method shows better and finer activation. In the second example, R101-FPN [35] from Detectron2 Model Zoo ( https://
184
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

Fig. 6. Visualization of three attention mechanisms. From left to right: clean image (ground truth), degraded image, ideal attention map of Qian et al. [1], adaptively predicted
attention map of baseline + CPA model, and adaptively predicted attention map of our model. The first row corresponds to an example with little adherent mist but obvious
raindrops, while the second row shows another example with obvious mist and raindrops. All attention maps share the same colorbar.

Fig. 7. Restoration results from different methods and their panoptic segmentation results. From left to right: ground truth, degraded images, Qian et al. + Qin et al. [1,5], re-
trained Qian et al. [1], re-trained Chen et al. [2], re-trained Qin et al. [5], and our method. Each two rows correspond to the same sample. The panoptic segmentation result is
attached below the corresponding image. The details of the images are best viewed by zooming in.

github.com/facebookresearch/detectron2), to segment these is magnified because it must jointly handle adherent mist and rain-
images. drops, but our method handles it well. Specifically, the challenges
As shown in Fig. 7, panoptic segmentation might make it easy to are as follows.
show the influence of adherent mist and raindrops from the per-
spective of vision algorithms. When observing the degraded image 1) The image degradation is severe. By intuitively comparing
in the first sample, the segmentation algorithm cannot find all the the degraded examples in Fig. 3 with those in Figs. 4 and 5,
bicycles and is not sure what is behind the bicycles. Undoubtedly, we can easily conclude that the coexistence of adherent mist
the uncertainty may influence the decisions involved in self- and raindrops tends to impair more information than purely
driving. Fortunately, bicycles could be counted correctly from atmospheric haze or only raindrops. The different quantitative
images obtained by the last four methods and the fence could be results of different tasks also validate this sense. As a result, a
roughly located in our restoration result. The second sample shows deep network capable of high performance is used in this work.
a semi-indoor environment. The segmentation borders are not Admittedly, although effective, the large amount of computa-
clear in the degraded image. Simultaneously, the last four methods tion required may limit real-time applications. A lightweight
could restore the image to the extent that the high-level algorithm architecture with comparable performance is expected in the
can approximately locate the edges of different surfaces. future.
Therefore, the results of the application experiment indicate 2) Prior-based methods (e.g., [1,16]) do not hold up well. Due to
that mist and raindrops adhering to camera lenses or windshields the interaction between adherent raindrops and adherent mist,
can severely hamper high-level computer applications. Moreover, the threshold-based and shape-based prior attention in [1,16]
solving this problem by creating new segmentation datasets is dif- do not work well in practical scenarios with both degradation
ficult because humans are also unable to accurately annotate sources. Our method specifically targets at this challenge and
objects in these degraded images. The proposed algorithm-based enhance the self-adaptive attention mechanisms by introducing
solution might help ease the problem and make the visual devices IPA blocks in addition to common attention layers.
more robust.
5. Conclusion
4.7. Discussion
In this work, we proposed an image degradation problem
The above experimental results indicate both the effectiveness related to the coexistence of adherent mist and raindrops. Both
of our method and the importance of the proposed problem. Unlike sources of interference with visibility are adherent to a camera lens
previous image restoration problems, the challenge of the new task or car windshield, showing different image characteristics com-
185
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

pared with atmospheric haze or rain streaks. Although the problem [3] W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, M.-H. Yang, Gated fusion network
for single image dehazing, in: Proceedings of the IEEE Conference on Computer
is widely observed in real life, little works have been conducted on
Vision and Pattern Recognition, 2018, pp. 3253–3261.
it. To solve this problem, we adopted an attentive convolutional [4] S. Li, I.B. Araujo, W. Ren, Z. Wang, E.K. Tokuda, R.H. Junior, R. Cesar-Junior, J.
network containing interpolation-based pyramid attention blocks Zhang, X. Guo, X. Cao, Single image deraining: A comprehensive benchmark
to strengthen the spatial perception of regional obstacles. Both analysis, in: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, 2019, pp. 3838–3847.
quantitative evaluation and intuitive examples showed quality [5] X. Qin, Z. Wang, Y. Bai, X. Xie, H. Jia, Ffa-net: Feature fusion attention network
improvement with the proposed method without explicit priors. for single image dehazing, in: Proceedings of the AAAI Conference on Artificial
The superiority of the proposed architecture was also validated Intelligence, Vol. 34, 2020, pp. 11908–11915.
[6] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the
even for the tasks of conventional de-hazing and the removal of IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–
purely raindrops. Restoration from the adherent mist and raindrop 7141.
degradation can benefit high-level tasks, such as panoptic segmen- [7] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very
deep residual channel attention networks, in: Proceedings of the European
tation, by producing reliable results, which demonstrates the sig- conference on computer vision (ECCV), 2018, pp. 286–301.
nificance of the proposed architecture in vision applications. In [8] Y. Zhang, K. Li, B. Zhong, Y. Fu, Residual non-local attention networks for image
the future, we will further simplify the architecture to reduce the restoration, in: International Conference on Learning Representations, 2019.
[9] F. Yang, J. Ren, Z. Lu, J. Zhang, Q. Zhang, Rain-component-aware capsule-gan for
computation complexity while keeping similar performance, thus single image de-raining, Pattern Recogn. 108377 (2021).
the real-time restoration could be achieved in more embedded sce- [10] X. Jin, Z. Chen, W. Li, Ai-gan: Asynchronous interactive generative adversarial
nes to handle the proposed adherent mist and raindrop degrada- network for single image rain removal, Pattern Recogn. 100 (2020) 107143.
[11] H. Kurihata, T. Takahashi, I. Ide, Y. Mekada, H. Murase, Y. Tamatsu, T. Miyahara,
tions. In addition, our experimentally acquired dataset and our
Rainy weather recognition from in-vehicle camera images for driver
codes will be released online (link) to facilitate further studies on assistance, IEEE Proceedings. Intelligent Vehicles Symposium 2005 (2005)
related tasks. 205–210.
[12] K. Ito, K. Noro, T. Aoki, An adherent raindrop detection method using mser, in:
Asia-Pacific Signal and Information Processing Association Annual Summit and
Conference (APSIPA), 2015, pp. 105–109.
6. CRediT authorship contribution statement
[13] S. You, R.T. Tan, R. Kawakami, Y. Mukaigawa, K. Ikeuchi, Adherent raindrop
modeling, detectionand removal in video, IEEE Trans. Pattern Anal. Mach.
Da He: Conceptualization, Methodology, Software, Validation, Intell. 38 (9) (2015) 1721–1733.
Writing - Original Draft. Xiaoyu Shang: Investigation, Data Cura- [14] D. Eigen, D. Krishnan, R. Fergus, Restoring an image taken through a window
covered with dirt or rain, in: Proceedings of the IEEE International Conference
tion. Jiajia Luo: Conceptualization, Methodology, Writing - Review on Computer Vision, 2013, pp. 633–640.
& Editing, Supervision. [15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.
Courville, Y. Bengio, Generative adversarial nets, Advances in Neural
Information Processing Systems (2014) 2672–2680.
CRediT authorship contribution statement [16] Y. Quan, S. Deng, Y. Chen, H. Ji, Deep learning for seeing through window with
raindrops, in: Proceedings of the IEEE International Conference on Computer
Vision, 2019, pp. 2463–2471.
Da He: Conceptualization, Methodology, Software, Validation, [17] M. Shao, L. Li, H. Wang, D. Meng, Selective generative adversarial network for
Writing - original draft. Xiaoyu Shang: Investigation, Data cura- raindrop removal from a single image, Neurocomputing 426 (2021) 265–273.
[18] B. Cai, X. Xu, K. Jia, C. Qing, D. Tao, Dehazenet: An end-to-end system for single
tion. Jiajia Luo: Conceptualization, Methodology, Writing - review image haze removal, IEEE Trans. Image Process. 25 (11) (2016) 5187–5198.
& editing, Supervision. [19] R. Fattal, Single image dehazing, ACM Trans. Graph. (TOG) 27 (3) (2008) 1–9.
[20] K. He, J. Sun, X. Tang, Single image haze removal using dark channel prior, IEEE
Trans. Pattern Anal. Mach. Intell. 33 (12) (2010) 2341–2353.
Data availability [21] F. Yuan, Y. Zhou, X. Xia, X. Qian, J. Huang, A confidence prior for image
dehazing, Pattern Recogn. 119 (2021) 108076.
[22] Z.-L. Ni, G.-B. Bian, G.-A. Wang, X.-H. Zhou, Z.-G. Hou, H.-B. Chen, X.-L. Xie,
We have shared the link to the data/code in the section of Pyramid attention aggregation network for semantic segmentation of surgical
Conclusion. instruments, in: Proceedings of the AAAI Conference on Artificial Intelligence,
Vol. 34, 2020, pp. 11782–11790.
[23] H. Li, P. Xiong, J. An, L. Wang, Pyramid attention network for semantic
Declaration of Competing Interest segmentation, in: Proceedings of the British Machine Vision Conference, 2018.
[24] Z. Huang, Z. Zhong, L. Sun, Q. Huo, Mask r-cnn with pyramid attention network
for scene text detection, in: 2019 IEEE Winter Conference on Applications of
The authors declare that they have no known competing finan- Computer Vision (WACV) IEEE, 2019, pp. 764–772.
cial interests or personal relationships that could have appeared [25] H. Wang, G. Wang, Z. Sheng, S. Zhang, Automated segmentation of skin lesion
based on pyramid attention network, in: International Workshop on Machine
to influence the work reported in this paper.
Learning in Medical Imaging, Springer, 2019, pp. 435–443.
[26] Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T.S. Huang, H. Shi, Pyramid
attention networks for image restoration, arXiv preprint arXiv:2004.13824
Acknowledgements (2020).
[27] J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and
This work was supported by National Key R&D Program of super-resolution, European Conference on Computer Vision (2016) 694–711.
[28] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A.
China (Grant No. 2021YFF0502900), National Natural Science Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution
Foundation of China (Grant No. 31870942); Peking University Clin- using a generative adversarial network, in: Proceedings of the IEEE Conference
ical Medicine Plus X – Young Scholars Project (Grant No. on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
[29] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment:
PKU2020LCXQ017, PKU2021LCXQ028); the Fundamental Research
from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4)
Funds for the Central Universities (Grant No. BMU2021YJ009); and (2004) 600–612.
PKU-Baidu Fund (Grant No. 2020BD039). [30] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale
hierarchical image database, IEEE Conference on Computer Vision and Pattern
Recognition (2009) 248–255.
References [31] B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, Aod-net: All-in-one dehazing network, in:
Proceedings of the IEEE international conference on computer vision, 2017, pp.
4770–4778.
[1] R. Qian, R.T. Tan, W. Yang, J. Su, J. Liu, Attentive generative adversarial network
[32] P. Isola, J.-Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with
for raindrop removal from a single image, in: Proceedings of the IEEE
conditional adversarial networks, in: Proceedings of the IEEE conference on
Conference on Computer Vision and Pattern Recognition, 2018, pp. 2482–
computer vision and pattern recognition, 2017, pp. 1125–1134.
2491.
[33] H. Zhang, V.M. Patel, Densely connected pyramid dehazing network, in:
[2] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, G. Hua, Gated context
Proceedings of the IEEE conference on computer vision and pattern
aggregation network for image dehazing and deraining, in: IEEE Winter
recognition, 2018, pp. 3194–3203.
Conference on Applications of Computer Vision (WACV), 2019, pp. 1375–1383.

186
D. He, X. Shang and J. Luo Neurocomputing 505 (2022) 178–187

[34] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollár, Panoptic segmentation, in: Jiajia Luo received his Ph.D. degree in mechanical
Proceedings of the IEEE Conference on Computer Vision and Pattern engineering from the University of Michigan, Ann Arbor,
Recognition, 2019, pp. 9404–9413. USA. He is currently an Assistant Professor at the
[35] Z. Cai, N. Vasconcelos, Cascade r-cnn: High quality object detection and Biomedical Engineering Department, Peking University,
instance segmentation, IEEE Trans. Pattern Anal. Mach. Intell. (2019). Beijing, China. His research interests include biomedical
imaging, biomechanics, and machine learning. He was a
recipient of a career development award in Michigan
Da He received his B.S. degree in Optoelectronic Infor- Institute for Clinical & Health Research, and Peking
mation Science and Engineering from Nankai Univer- University Clinical Medicine + X Young Investigator
sity. In 2018, he joined the University of Michigan- Grant Award. His research is supported by National Key
Shanghai Jiao Tong University Joint Institute, Shanghai R&D Program of China, National Natural Science Foun-
Jiao Tong University, Shanghai, China, as a graduate dation of China, and PKU-Baidu Fund.
student. He is interested in image processing and deep
learning.

Xiaoyu Shang received the B.S. degree in Optoelectronic


Information of Science and Engineering from Sun Yat-
sen University. In 2018, she joined the University of
Michigan-Shanghai Jiao Tong University Joint Institute,
Shanghai Jiao Tong University, Shanghai, China, as a
graduate student. Her research interest is image pro-
cessing, especially for optical imaging with deep learn-
ing.

187

You might also like