Image Denoising Network Based On Subband Information Sharing Using Dual-Tree Complex Wavelet

Neural Processing Letters
https://doi.org/10.1007/s11063-023-11359-1
Image Denoising Network Based on Subband Information

Sharing Using Dual-Tree Complex Wavelet
Kui Liu1,2 · Yiping Guo1,2 · Benyue Su2,3
Accepted: 10 July 2023

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
The difficulty in image denoising arises from the need to recover complex texture regions.
Previous methods have struggled with this task, producing images that are too smooth and are
prone to aliasing and checkerboard patterns in complex regions. To address these issues, we
propose an image denoising network based on subband information sharing using dual-tree
complex wavelet. This network combines the spatial and transform domains through dual-tree
complex wavelet transform (DTCWT) to capture both spatially structured features and time-
frequency localized features for enriching the feature space. To strengthen the recovery in
hard scenes, such as weak textures and high-frequency details, Subband Information Sharing
Unit (SISU) is designed for the interplay of information in the transform domain, establishing
the complementarity and correlation among the subbands obtained by DTCWT. Moreover,
rectified linear units and exponential linear units are used in the spatial and transform domains,
respectively, to match the properties of elements in different domains. Comprehensive exper-
iments demonstrate the powerful recovery capability of the network for both textured and
smooth regions, as well as the competitive results of the network in non-blind/blind image
denoising.
Keywords Image denoising · Convolutional neural network · Dual-tree complex wavelet

transform · Attention mechanism
B Kui Liu
liukui@aqnu.edu.cn
Yiping Guo
guoyiping67@163.com
Benyue Su
subenyue@sohu.com
1 School of Computer and Information, Anqing Normal University, Anqing 246133, Anhui, China
2 Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing Normal
University, Anqing 246133, Anhui, China
3 School of Mathematics and Computer, Tongling University, Tongling 244061, Anhui, China
123
K. Liu et al.
1 Introduction
In recent years, deep neural networks have achieved significant success in several image
processing fields, such as image recognition [1, 2], object detection [3, 4], and human pose
estimation and recovery [5, 6]. Effective image denoising is crucial for many of these image
processing tasks and has been shown to improve subsequent performance. Therefore, image
denoising is a critical issue as low-level vision [7]. In this paper, we aim to restore a high-
quality image x from a noisy image y, where the noise is additive, white, and Gaussian. The
image degradation model is defined as
y = x + v, (1)
where v ∼ N (0, σ 2 ) denotes the additive white Gaussian noise (AWGN) with zero mean
and standard deviation σ .
Generally, image denoising algorithms can be classified into two groups: generative
approaches [8–10] and discriminative approaches [11–23]. Discriminative approaches have
become the dominant method for image denoising due to their simpler learning process
and more accurate mapping relationship between inputs and outputs compared to generative
models. Among the discriminative approaches, convolutional neural networks (CNNs) are
becoming a preferred choice due to their powerful representational capabilities. The original
CNN-based methods [11–16, 18] exploit spatially structured features of images in the spatial
domain. These networks enhance performance by enlarging the receptive field or increas-
ing the width and depth of the network. However, their ability to recover complex texture
regions, such as weak textures and high-frequency details, is still insufficient. One approach
to alleviate these issues is to introduce wavelet transform into CNN, which has the ability to
effectively learn time-frequency localized features in the transform domain.
Current methods [19–21] integrate discrete wavelet transform (DWT) into CNN, leading
to an improvement in handling texture regions. However, CNNs with DWT suffer from draw-
backs, such as generating artifacts related to the underlying wavelet structure [22], leading
to inadequate denoising performance. In contrast, CNNs with dual-tree complex wavelet
transform (DTCWT) can solve such issues because of the approximate shift invariance and
directional selectivity of DTCWT. For instance, Zhou et al. [23] proposed a network that uses
seven parallel independent convolutional streams to process the seven subbands obtained from
DTCWT. However, this approach ignores the complementarity of subbands and cuts the cor-
relations among subbands after DTCWT, which can lead to adverse effects on denoising
performance.
To address these issues, we propose an image denoising network based on subband infor-
mation sharing using dual-tree complex wavelet. Our network combines spatial domain
processing, which learns spatially structured features, with transform domain processing,
which learns time-frequency localized features, to increase the richness of the feature space. In
the transform domain, the Subband Information Sharing Unit (SISU) is designed to establish
complementarity and correlation among the subbands. Sharing information among the high-
frequency subbands in the SISU facilitates the separation of noisy signals and high-frequency
contents, while interaction information between the high- and low-frequency subbands helps
to prevent weak texture regions from becoming too smooth. In addition, different activation
functions are used for different domains. We select the rectified linear unit (ReLU) function
for the spatial domain to account for pixel property and the exponential linear unit (ELU)
function for the transform domain to reflect frequency property. To validate the effectiveness
of the proposed denoising network, we conduct extensive experiments on grayscale and color
123
Image Denoising Network Based on Subband Information…
images against mainstream denoising methods. In summary, the main contributions of this
paper are summarized as follows:
• Technically, we propose an image denoising network based on subband information

sharing using dual-tree complex wavelet. The proposed network enriches the feature
space by combining spatially structured features and time-frequency localized features
via a dual-tree complex wavelet transform.
• The SISU is designed in the transform domain to enhance the recovery of texture
information efficiently through the interplay of subbands. Information sharing across
high-frequency subbands enhances the ability to distinguish noisy signals from high-
frequency contents. Information sharing between high- and low-frequency subbands
prevents weak texture regions from being recovered too smoothly.
• The ReLU is selected for the spatial domain and the ELU is selected for the transform
domain, because of the properties of elements in different domains.
• Quantitative and qualitative experiments clearly show that our network outperforms many
mainstream networks in removing non-blind and blind Gaussian noise.
2 Related Works
In this section, we present a concise review of CNN-based methods for image denoising with
a focus on denoising in either spatial or transform domains.
In the spatial domain, CNN-based methods acquire the spatially structured features of
images to achieve image denoising. Zhang et al. [11] proposed a feedforward CNN for
denoising (DnCNN), integrating residual learning and batch normalization into the network.
However, with the increase in network depth, the network is likely to saturate its performance,
making training difficult. To tackle this issue, Tian et al. [14] proposed a deep CNN with
batch renormalization (BRDNet) and enlarged the network width. Considering the increasing
network depth weakens the effect of shallow features. Tian et al. [15] introduced attention-
guided CNN (ADNet) to share shallow features with deep features, which helps the network to
preserve fine-grained feature information. Tian et al. [16] proposed a sparse-based dual CNN
(DudeNet) to harness the diversity of features. Acar et al. [17] and Jia et al. [18] respectively
proposed a densely connected dilated residual network and a multi-scale dilated residual
convolution network to exploit multi-scale information. These methods enhance denoising
performance by increasing the network depth and width, adding an attention mechanism, and
enlarging the receptive field. However, in the spatial domain, it remains arduous to distinguish
noise from sharp features and to recover weak textures.
In order to enhance the denoising effect of complex regions, researchers attempt to place
images into the transform domain to solve the problem of image denoising. In the transform
domain, CNNs learn time-frequency localized features from images by embedding wavelet
transforms. Bae et al. [19] implemented an analytic mapping of feature space through DWT
by considering manifold simplification, which makes the feature structure in the topological
space more simple. Instead of down-sampling in encoding and up-sampling in decoding, Liu
et al. [20] adopted DWT and inverse DWT in the U-Net structure. Tian et al. [21] proposed a
multi-stage image denoising CNN with the wavelet transform (MWDCNN) that combines the
spatial and transform domains for enriching feature information. All of the aforementioned
methods are CNNs embedded with DWT, which mitigates blurred edges and unclear details
to a certain extent. Nevertheless, owing to the shift variance and poor directional selectivity
of DWT, checkerboard patterns and artifacts appear during recovery, compromising image
123
K. Liu et al.
Fig. 1 a The original image b the reconstructed image based on the high-frequency subbands of DWT c the
reconstructed image based on the high-frequency subbands of DTCWT
structure semantics. Recently, Zhou et al. [23] introduced DTCWT in CNN to compensate
for the limitations of DWT. However, due to the independent treatment of each subband after
DTCWT, the interdependency and correlation among the subbands are ignored.
Based on the above, we propose a network to integrate spatially structured features with
time-frequency localized features via DTCWT for feature space richness. In order to construct
the complementarity and correlation among subbands in the transform domain, SISU is
designed to enhance noise removal and texture restoration in complex regions through inter-
subband information exchange.
3 Proposed Method
3.1 Dual-Tree Complex Wavelet Transform
DTCWT is an improved version of DWT [24]. DTCWT provides approximate shift invari-
ance and directional selectivity in high-dimensional space, which DWT does not [25]. The
approximate shift invariance of DTCWT means that the same texture will generate the same
response features regardless of the sampling position, whereas DWT may produce differential
responses for the same texture from different sampling positions. The directional selectivity of
DTCWT provides robustness to diagonal features compared to DWT, effectively represent-
ing geometric features without generating checkerboard patterns during image denoising.
Figure 1 illustrates that the reconstructed image based on the high-frequency subbands of
DTCWT is free from ringing and aliasing artifacts. As a result, DTCWT is more suitable
than DWT for embedding in CNN to handle complex texture regions. Therefore, our pro-
posed network is to embed DTCWT into CNN to transform feature maps between spatial
and transform domains.
3.2 Overall Framework
The proposed denoising network based on subband information sharing using the dual-
tree complex wavelet can intuitively learn features and recover texture regions with higher
accuracy. As shown in Fig. 2 , the noisy image undergoes spatial and transform domain
processing stages, followed by another spatial domain processing stage to obtain a clean
image. The spatial domain processing stage extracts spatially structured features, while the
transform domain processing stage extracts time-frequency localized features. The proposed
123
Fig. 2 The framework of the proposed network
Fig. 3 The architecture of the

Processing Block
network learns these two types of features separately in discrete steps, avoiding confusion
from interleaving.
Due to the multi-directional nature of complex texture regions, we exploit the directional
selectivity of DTCWT to individually recover distinct directional features in the transform
domain. In the transform domain processing stage, DTCWT transforms the feature maps from
the spatial domain to the transform domain and generates seven subbands, including one low-
frequency subband and six high-frequency subbands (−75◦ , −45◦ , −15◦ , 15◦ , 45◦ , 75◦ ). We
then feed the seven subbands into N (N=5 in this paper) processing blocks. As illustrated
in Fig. 3 , the core of the processing block is SISU. The SISU effectively enhances noise
removal and recovers contaminated regions by establishing complementarity and correlation
among subbands. In the end, the inverse DTCWT completes the information reconstruction
of feature maps from the transform domain into the spatial domain.
For optimizing the proposed network, we choose the Charbonnier loss [26]:

L(
X, X) =
X − X 2 + ε 2 , (2)
where X denotes the output of the proposed network, X expresses the ground-truth image,
and ε is a constant empirically set to 10−3 to prevent the loss from being zero.
123
K. Liu et al.
Fig. 4 The architecture of the Subband Information Sharing Unit (SISU)
Fig. 5 The architecture of the Dual Attention Unit (DAU)
3.3 Subband Information Sharing Unit
The central component of the processing block, as shown in Fig. 3, is the SISU. The proposed
SISU, which is depicted in Fig. 4 , provides the advantage of interconnecting all subbands
to utilize mutual assistance for noise removal and information recovery. The SISU provides
the high-frequency subband with information from all high-frequency subbands to enhance
the differentiation between noisy signals and high-frequency contents. Similarly, the SISU
provides information from high-frequency subbands to the low-frequency subband in order
to facilitate the recovery of weak texture regions. Within the SISU, the dual attention unit
(DAU) shown in Fig. 5receives subbands to capture the channel and spatial information of
feature maps, allowing the network to focus on sharing essential information. We use the
DAU as presented in [27], but modify the activation function with the ELU function, as
described in Section 3.4. The summed feature map of the six high-frequency subbands after
the DAU is subsequently passed through the H2H and H2L information sharing units for
further processing.
H2H Information Sharing Unit:

The unit, which is inspired by the selective kernel network proposed in [28] but has some
important differences, aims to increase the degree of differentiation between noisy signals and
high-frequency contents by incorporating complementary information from high-frequency
123
subbands at different directions. Since noise is primarily a high-frequency signal, the high-
frequency subbands contain more noise than the low-frequency subbands. The number of
noisy signals and high-frequency contents varies among channels in the high-frequency sub-
band feature map, and therefore, the importance of each channel is different. As a result, the
H2H information sharing unit redistributes the channel weights for each high-frequency sub-
band by sharing information from all high-frequency subbands. Specifically, the unit receives
the summed feature and creates a compressed feature representation using Global Average
Pooling (GAP) over the spatial dimension, followed by channel-downscaling convolutional
layers. This yields six high-frequency channel feature descriptors obtained by using six par-
allel channel-upscaling convolutional layers and sigmoid activations. Lastly, high-frequency
channel feature recalibration is implemented by multiplying the original six high-frequency
subbands with the high-frequency channel feature descriptors. The H2H information shar-
ing unit improves the differentiation degree between the noisy signals and high-frequency
contents by redistributing the channel weights of each high-frequency subband.
H2L Information Sharing Unit:
The unit is inspired by the bottleneck attention module presented in [29], however, it is distinct
in that it transfers information from high-frequency subbands to the low-frequency subband.
The unit allows the low-frequency path to focus on the recovery of texture regions. To preserve
the texture of complex regions from blurring or disappearing due to noise removal, the H2L
information sharing unit redistributes the spatial weights for the low-frequency subband
by sharing information from high-frequency subbands. Specifically, the summed feature
map is passed through a single convolutional layer for channel compression. Next, four
convolutional layers and a sigmoid activation are employed to obtain a low-frequency spatial
feature descriptor with a single channel. Lastly, low-frequency spatial feature recalibration is
executed by multiplying the original low-frequency subbands with the low-frequency spatial
feature descriptor. The H2L information sharing unit enables the recovery of complex region
textures and prevents the over-smoothing of weak textures by transferring information from
all high-frequency subbands to the low-frequency subband.
3.4 Selection of Activation Function
The paper selects the ReLU to process feature maps in the spatial domain, while the ELU is
chosen to process feature maps in the transform domain, as illustrated in Figs. 2 and 3.
ReLU [30] is a piecewise linear function, given by the expression:

x, x > 0;
ReLU = (3)
0, x ≤ 0.
Output values are non-negative, whereas negative values are set to zero. It is well known that
each feature map element in the spatial domain falls in the range of 0−1 without any negative
values after normalization. The small contribution of negative elements in a recovered image
in the spatial domain makes them redundant. Therefore, ReLU is chosen to set the negative
elements of feature maps to zero in the spatial domain.
ELU [31] outputs contain negative values, which makes it more appropriate to process
the feature maps in the transform domain than ReLU. ELU is defined as:

x, x > 0;
E LU = (4)
α(e x − 1), x ≤ 0.
123
K. Liu et al.
The inverse DTCWT transforms the feature maps from the transform domain to the spatial
domain, where the negative frequency elements have to participate in the reconstruction of
the feature map. This fact makes it unreasonable to disregard negative values in the transform
domain feature maps. Therefore, ELU is chosen to process feature maps in the transform
domain.
4 Experiments
4.1 Experimental Setup
Datasets: The datasets are divided into two categories: gray-noisy and color-noisy images.
To train the model on grayscale images, 400 grayscale images of 180 × 180 size from the
Berkeley Segmentation dataset (BSD) [32] are used, with training patches of 50 × 50 for
both blind and non-blind image denoising. For testing the gray-noisy images, the paper uses
12 images from Set12 and 68 natural images from BSD68 [11]. For color image denoising,
180×180 color images from BSD432 [11] are cut into patches of 50×50 size for the training
process. The proposed approach is assessed for color image denoising on two public datasets:
CBSD68, which includes 68 images, and Kodak24 [33], which includes 24 images.
Implementation Details:
The experiments are performed on PyTorch and an Nvidia RTX A5000 GPU with Ubuntu
20.04 as the operating system. The clean images are corrupted with white Gaussian noise
with a variance of σ 2 to produce corresponding noisy images. The noise level employed is
uniform across both grayscale and color images. Non-blind noise is trained on three levels
of noise, namely σ = 15, 25, 50. The noise level σ is randomly selected from the interval
of [0, 55] during training with the blind image denoising scenario. The Adam optimizer is
employed with β = (0.9, 0.999) and trained with an initial learning rate set to 2 × 10−4 . The
learning rate is progressively decreased from the initial value to 10−6 during training using
the cosine annealing strategy [34]. The training process runs for 60 epochs, and the batch
size is set to 64.1
4.2 Quantitative and Qualitative Evaluations
To ensure the fairness and reliability of the comparison tests, all experiments are performed
under the same experimental conditions. We use peak signal-to-noise ratio (PSNR) and
structural similarity (SSIM) to conduct quantitative evaluation. We select some examples
to visualize the denoised results for qualitative evaluation. The compared methods cover
different kinds of denoisers, including DnCNN [11], BRDNet [14], ADNet [15], DudeNet
[16], MWDCNN [21], and the network proposed by Zhou et al. [23] which we refer to as
DTCWTDN in the following.
Non-blind Removal of AWGN on Grayscale Image Denoising: Table 1shows the average
PSNR and SSIM values on Set12 and BSD68 respectively. We select a grayscale image
with σ = 50 from Set12 for visual denoising as shown in Fig. 6 . It can be seen that the
proposed method outperforms the compared methods in terms of average PSNR and average
SSIM. The visualization results clearly show that the proposed method is able to produce
1 The code is available at https://github.com/gyp67/Denoising_SISU_DTCWT.
123
Table 1 Average PSNR (dB)/SSIM of different methods on Set12 and BSD68 in non-blind AWGN removal
Datasets Methods 15 25 50
Set12 DnCNN [11] 32.75/0.9042 30.23/0.8603 26.93/0.7750

BRDNet [14] 32.77/0.9030 30.37/0.8621 27.17/0.7836
ADNet [15] 32.82/0.9055 30.37/0.8631 27.12/0.7817
DudeNet [16] 32.85/0.9059 30.41/0.8639 27.18/0.7843
MWDCNN [21] 32.92/0.9053 30.54/0.8645 27.38/0.7903
DTCWTDN [23] 32.80/0.9053 30.37/0.8630 27.13/0.7820
Proposed 33.02/0.9081 30.63/0.8682 27.42/0.7924
BSD68 DnCNN [11] 31.45/0.8944 28.89/0.8326 26.05/0.7197
BRDNet [14] 31.65/0.8941 29.17/0.8312 26.24/0.7225
ADNet [15] 31.66/0.8959 29.17/0.8334 26.21/0.7230
DudeNet [16] 31.69/0.8963 29.20/0.8341 26.25/0.7246
MWDCNN [21] 31.74/0.8941 29.24/0.8301 26.32/0.7234
DTCWTDN [23] 31.65/0.8946 29.16/0.8315 26.22/0.7221
Proposed 31.80/0.8980 29.31/0.8371 26.35/0.7287
The best results are bold and the second best results are underlined
Fig. 6 Denoising results of a grayscale image from Set12 with noise level σ = 50 in non-blind AWGN
removal
123
K. Liu et al.
Table 2 Average PSNR (dB)/SSIM of different methods on Set12 and BSD68 in blind AWGN removal
Set12 DnCNN [11] 32.44/0.8977 30.13/0.8556 26.96/0.7729

BRDNet [14] 32.60/0.9013 30.22/0.8585 27.00/0.7745
ADNet [15] 32.65/0.9026 30.29/0.8601 27.04/0.7747
DudeNet [16] 32.71/0.9031 30.32/0.8603 27.06/0.7742
MWDCNN [21] 32.88/0.9070 30.55/0.8676 27.43/0.7921
DTCWTDN [23] 32.63/0.9025 30.29/0.8603 27.06/0.7745
Proposed 32.97/0.9071 30.61/0.8673 27.48/0.7920
BSD68 DnCNN [11] 30.87/0.8696 28.32/0.8093 25.63/0.7035
BRDNet [14] 31.51/0.8895 29.05/0.8245 26.11/0.7108
ADNet [15] 31.56/0.8903 29.09/0.8266 26.14/0.7138
DudeNet [16] 31.58/0.8904 29.11/0.8265 26.15/0.7134
MWDCNN [21] 31.68/0.8974 29.26/0.8345 26.34/0.7256
DTCWTDN [23] 31.55/0.8904 29.10/0.8268 26.14/0.7140
Proposed 31.78/0.8970 29.30/0.8345 26.35/0.7257
The best results are bold and the second best results are underlined
Fig. 7 Denoising results of a grayscale image from BSD68 with noise level σ = 25 in blind AWGN removal
clear denoising results when dealing with multiple texture regions and is less prone to blur
or generate checkerboard patterns than the compared methods.
Blind Removal of AWGN on Grayscale Image Denoising: Table 2shows the PSNR and
SSIM values on Set12 and BSD68. Figure 7shows the visual denoising results for a gray image
from BSD68 with σ = 25. The average PSNR in blind denoising outperforms the compared
algorithms, while the average SSIM is sometimes only slightly lower than MWDCNN, with
the difference being within 0.0004. The main reason for this result is that when our method is
used for blind denoising of Grayscale, due to the limited amount of information in the scene,
it will lead to the network learning an aggressive approach to deal with abrupt pixels in
regions with changing pixels, which will sometimes lead to a slightly lower SSIM. However,
123
Table 3 Average PSNR (dB)/SSIM of different methods on CBSD68 and Kodak24 in non-blind AWGN
removal
CBSD68 DnCNN [11] 33.94/0.9322 31.06/0.8868 27.24/0.7853

BRDNet [14] 33.99/0.9312 31.29/0.8872 27.99/0.7929
ADNet [15] 34.02/0.9331 31.08/0.8871 27.26/0.7854
DudeNet [16] 34.12/0.9351 31.41/0.8906 28.10/0.8003
MWDCNN [21] 34.18/0.9353 31.48/0.8910 28.21/0.8026
DTCWTDN [23] 34.03/0.9337 31.28/0.8874 28.01/0.7930
Proposed 34.24/0.9371 31.55/0.8940 28.26/0.8062
Kodak24 DnCNN [11] 34.72/0.9234 32.03/0.8808 28.35/0.7857
BRDNet [14] 34.80/0.9233 32.28/0.8811 29.13/0.7943
ADNet [15] 34.85/0.9245 32.06/0.8812 28.44/0.7860
DudeNet [16] 34.98/0.9274 32.45/0.8855 29.26/0.8026
MWDCNN [21] 35.07/0.9275 32.57/0.8865 29.44/0.8060
DTCWTDN [23] 34.81/0.9241 32.27/0.8807 29.13/0.7945
Proposed 35.15/0.9296 32.63/0.8893 29.45/0.8088
The best results are highlighted in bold and the second best results are underlined
Fig. 8 Denoising results of a color image from Kodak24 with noise level σ = 15 in non-blind AWGN removal
aggressive pixel handling makes the preservation of image details well, and our method
performs well for recovery details in the scene. According to Table 2, our proposed method
still outperforms other existing techniques. It can be seen from Fig. 7 that the recovered bounds
of the proposed method are sharper than those of other methods. The proposed method has
a strong ability to recover smooth and textured regions without aliasing.
Non-blind Removal of AWGN on Color Image Denoising: Table 3shows the PSNR and
SSIM values on CBSD68 and Kodak24. The test image with σ = 15 in Fig. 8is selected from
Kodak24. The proposed method achieves the best performance in removing specific noise
from color images. As can be seen from the enlarged part of Fig. 8, the proposed method not
123
K. Liu et al.
Table 4 Average PSNR (dB)/SSIM of different methods on CBSD68 and Kodak24 in blind AWGN removal
CBSD68 DnCNN [11] 33.62/0.9283 31.12/0.8847 27.66/0.7769

BRDNet [14] 33.80/0.9311 31.12/0.8837 27.76/0.7807
ADNet [15] 33.82/0.9314 31.14/0.8847 27.72/0.7772
DudeNet [16] 33.94/0.9328 31.26/0.8871 27.88/0.7842
MWDCNN [21] 34.08/0.9350 31.37/0.8908 28.11/0.7950
DTCWTDN [23] 33.81/0.9315 31.14/0.8850 27.75/0.7779
Proposed 34.07/0.9354 31.37/0.8915 28.21/0.8047
Kodak24 DnCNN [11] 34.49/0.9226 32.07/0.8773 28.69/0.7678
BRDNet [14] 34.58/0.9224 32.09/0.8770 28.81/0.7798
ADNet [15] 34.60/0.9228 32.11/0.8782 28.77/0.7761
DudeNet [16] 34.74/0.9245 32.26/0.8814 28.98/0.7855
MWDCNN [21] 34.95/0.9274 32.38/0.8846 29.30/0.7975
DTCWTDN [23] 34.59/0.9226 32.11/0.8780 28.79/0.7788
Proposed 34.93/0.9275 32.39/0.8846 29.41/0.8071
The best results are highlighted in bold and the second best results are underlined
Table 5 Average computational time (s) of different methods to denoise each image of the BSD68
Methods DnCNN BRDNet ADNet DudeNet MWDCNN DTCWTDN Proposed
Time 0.0427 0.0634 0.0416 0.0529 0.0676 0.0835 0.1525
only achieves excellent results in recovering the curtain folds but also has the largest number
of recovered white spots on the casement among the mainstream methods.
Blind Removal of AWGN on Color Image Denoising: Table 4shows the PSNR and SSIM
values on CBSD68 and Kodak24, respectively. Figure 9is a visualization of a color image with
noise level σ = 50 on blind denoising. PSNR of the proposed method is sub-optimal only at
σ = 15. SSIM of the proposed method outperforms the mainstream methods, indicating that
the proposed method can recover better structural semantics. From Fig. 9, the enlarged parts
show that the proposed method is more efficient than the mainstream methods in recovering
textures similar to the double-folded eyelid region.
According to the above experiments and analysis, the proposed method achieves excellent
denoising performance compared to other methods. Most other methods either sacrifice
structural content and texture details to obtain overly smooth images or introduce chromatic
artifacts and speckle textures. The proposed method can maintain spatial smoothness in
homogeneous regions and sharpness in texture regions.
4.3 Analysis of Computational Time
Table 5compares the average computational times of each network to evaluate its computa-
tional efficiency. It presents the average computational time taken by each network to denoise
an image from BSD68. The computational time of our network is not optimal compared to
other methods. This is mainly due to the network’s specific treatment of different subbands for
information exchange and guidance. The particular processing technique gives our network
123
Fig. 9 Denoising results of a color image from CBSD68 with noise level σ = 50 in blind AWGN removal
a performance advantage. The proposed network is very suitable for denoising on images
with multiple textures, which can preserve the details of the image to a greater extent, which
other methods do not have. Notably, six of the seven subbands are high-frequency subbands,
and the high-frequency subband tensors have non-zero values only at the eigenpoints of the
textures. In other words, the tensors on the six processing streams are sparse, which means
that our network has more optimization space compared to other networks. Based on the
above analysis, we believe that the proposed network combined with existing optimization
methods can significantly reduce the computational time while still maintaining superior
performance. Our future research seeks to extend this work and explore further opportunities
for improvement.
123
K. Liu et al.
Table 6 Average PSNR (dB)/SSIM of the proposed network with SISU or without SISU on BSD68 in blind
AWGN removal
BSD68 Without SISU 31.68/0.8929 29.20/0.8288 26.27/0.7193

With SISU 31.78/0.8970 29.30/0.8345 26.35/0.7257
The best results are bold
Table 7 Average PSNR (dB)/SSIM of the proposed network with DWT or DTCWT on BSD68 in blind AWGN
removal
BSD68 DWT-introduced 31.75/0.8959 29.28/0.8331 26.35/0.7254

DTCWT-introduced 31.78/0.8970 29.30/0.8345 26.35/0.7257
4.4 Ablation Studies
Effectiveness of SISU: To assess the efficacy of SISU, an experiment is conducted to compare

the proposed method with and without SISU for blind removal of AWGN on grayscale image
denoising. The results of this experiment are presented in Table 6. To elaborate, SISU is a
mechanism that establishes correlations among subbands, thereby sharing information from
all high-frequency subbands to each subband, with the aim of attracting more attention of the
network to texture and noise regions. Consequently, the proposed network integrated with
SISU exhibits a greater proficiency in noise removal while preserving sharp texture than the
network without SISU. In this regard, the effectiveness of SISU is clearly shown in Table 6,
which demonstrates a significant improvement in denoising performance.
Benefits of Introducing DTCWT: We evaluate the benefits of using DTCWT in the proposed
network by comparing it with a version of the DWT-introduced network. The comparison is
conducted in blind AWGN removal for grayscale image denoising, as presented in Table 7
. The results indicate that the DTCWT-introduced version can accurately recover texture at
low noise levels compared to the DWT-introduced version. However, as the level of noise
increases, the perturbation of the noise on the texture increases, and the confusion between
image content and noise also increases, making it difficult to represent geometric features
of the image, whether CNN is embedded in DTCWT or DWT. Overall, DTCWT is more
suitable than DWT for embedding CNN to handle complex regions, especially at low noise
levels.
Selection of Activation Function: We conduct an experiment on blind AWGN removal for
grayscale image denoising using ReLU only or ELU only to validate the rationale behind the
activation function selection. When ELU is the only activation function, both PSNR and SSIM
improve compared to ReLU alone. The use of ELU in the transform domain and ReLU in
the spatial domain further enhances SSIM. The results are presented in Table 8, underscoring
the validity of our choice of activation function selection. Since negative elements in the
spatial domain have almost no contribution, ReLU is chosen, while negative elements in the
transform domain need to be involved in the inverse DTCWT, so ELU is chosen.
Number of Processing Blocks: We evaluate the denoising performance using various num-
bers of processing blocks, as presented in Table 9 . Our findings reveal that the denoising
123
Table 8 Average PSNR (dB)/SSIM of the proposed network with different activation functions on BSD68 in
blind AWGN removal
BSD68 Only ReLU 31.76/0.8964 29.28/0.8336 26.35/0.7253

Only ELU 31.78/0.8968 29.30/0.8344 26.35/0.7257
Combination of ReLU and ELU 31.78/0.8970 29.30/0.8345 26.35/0.7257
Table 9 Average PSNR (dB)/SSIM of the proposed network with different numbers of Processing Blocks on
BSD68 in blind AWGN removal
Datasets Number of Processing Blocks 15 25 50
BSD68 4 31.76/0.8968 29.28/0.8341 26.33/0.7235

5 31.78/0.8970 29.30/0.8345 26.35/0.7257
6 31.78/0.8976 29.30/0.8362 26.36/0.7274
performance improves consistently with the number of processing blocks, indicating that
the processing block design is effective. From Table 9, it can be seen that the number of
processing blocks starts from 5, and the PSNR growth is slow. So in the paper, we use 5
processing blocks in the proposed network.
5 Conclusion
In this paper, we propose an image denoising network based on subband information sharing
using dual-tree complex wavelet. The proposed network combines the spatial and transform
domains of an image, through a dual-tree complex wavelet transform, to learn spatially struc-
tured features and time-frequency localized features for the richness of the feature space. To
enhance noise removal and improve texture sharpness in complex regions, we design the SISU
in the transform domain to establish inter-subband interactions that generate complementarity
and correlation among subbands. By sharing information among high-frequency subbands,
the network increases the accuracy of distinguishing noisy signals and high-frequency con-
tents using complementary directional texture information. Sharing high-frequency subband
information with low-frequency subbands enhances texture restoration in complex regions.
Furthermore, we choose ReLU and ELU as activation functions for the spatial and transform
domains, respectively, due to the properties of the elements in the different domains. Overall,
our proposed network is capable of achieving sharper textures while maintaining smoothness
in homogeneous regions. We test the proposed network on four publicly available datasets,
and our results demonstrate the superiority and effectiveness of the proposed network in
image denoising.
Acknowledgements This work is supported by the Major Special Science and Technology Project of Anhui
Province (No. 201903a06020006) and the Key Project of Education Natural Science Research of Anhui
Province of China (No. KJ2017A353).
123
K. Liu et al.
Declarations
Conflict of interest The authors declare that there is no conflict of interest regarding the publication of this
paper.
References
1. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of
the IEEE conference on computer vision and pattern recognition, pp 770–778 . https://doi.org/10.1109/
cvpr.2016.90
2. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained
image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578. https://doi.org/10.1109/tpami.
2019.2932058
3. Zheng H, Chen J, Chen L, Li Y, Yan Z (2020) Feature enhancement for multi-scale object detection.
Neural Process Lett 51:1907–1919. https://doi.org/10.1007/s11063-019-10182-x
4. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-
art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pp 7464–7475
5. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery.
IEEE Trans Image Process 24(12):5659–5670. https://doi.org/10.1109/tip.2015.2487860
6. Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold
deep learning. IEEE Trans Ind Inf 15(7):3952–3961. https://doi.org/10.1109/tii.2018.2884211
7. Tian C, Fei L, Zheng W, Xu Y, Zuo W, Lin C-W (2020) Deep learning on image denoising: an overview.
Neural Netw 131:251–275. https://doi.org/10.1016/j.neunet.2020.07.025
8. Zhong Y, Liu L, Zhao D, Li H (2020) A generative adversarial network for image denoising. Multimed
Tools Appl 79:16517–16529
9. Dey R, Bhattacharjee D, Nasipuri M (2020) Image denoising using generative adversarial network.
Intelligent Computing: Image Processing Based Applications, pp 73–90
10. Ma R, Zhang B, Hu H (2020) Gaussian pyramid of conditional generative adversarial network for real-
world noisy image denoising. Neural Process Lett 51:2669–2684. https://doi.org/10.1007/s11063-020-
10215-w
11. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of
deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155. https://doi.org/10.1109/
tip.2017.2662206
12. Zhang K, Zuo W, Zhang L (2018) Ffdnet: toward a fast and flexible solution for CNN-based image
denoising. IEEE Trans Image Process 27(9):4608–4622. https://doi.org/10.1109/tip.2018.2839891
13. Tian C, Xu Y, Fei L, Wang J, Wen J, Luo N (2019) Enhanced CNN for image denoising. CAAI Tran Intell
Technol 4(1):17–23. https://doi.org/10.1049/trit.2018.1054
14. Tian C, Xu Y, Zuo W (2020) Image denoising using deep CNN with batch renormalization. Neural Netw
121:461–473. https://doi.org/10.1016/j.neunet.2019.08.022
15. Tian C, Xu Y, Li Z, Zuo W, Fei L, Liu H (2020) Attention-guided CNN for image denoising. Neural Netw
124:117–129. https://doi.org/10.1016/j.neunet.2019.12.024
16. Tian C, Xu Y, Zuo W, Du B, Lin C-W, Zhang D (2021) Designing and training of a dual CNN for image
denoising. Knowl Based Syst 226:106949. https://doi.org/10.1016/j.knosys.2021.106949
17. Acar V, Eksioglu EM (2022) Densely connected dilated residual network for image denoising: Ddr-net.
Neural Process Lett. https://doi.org/10.1007/s11063-022-11100-4
18. Jia X, Peng Y, Ge B, Li J, Liu S, Wang W (2022) A multi-scale dilated residual convolution network for
image denoising. Neural Process Lett. https://doi.org/10.1007/s11063-022-10934-2
19. Bae W, Yoo J, Chul Ye J (2017) Beyond deep residual learning for image restoration: persistent homology-
guided manifold simplification. In: Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, pp 145–153. https://doi.org/10.1109/cvprw.2017.152
20. Liu P, Zhang H, Zhang K, Lin L, Zuo W (2018) Multi-level wavelet-cnn for image restoration. In:
Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 773–782.
https://doi.org/10.1109/cvprw.2018.00121
21. Tian C, Zheng M, Zuo W, Zhang B, Zhang Y, Zhang D (2023) Multi-stage image denoising with the
wavelet transform. Pattern Recognit 134:109050. https://doi.org/10.1016/j.patcog.2022.109050
22. Goyal B, Dogra A, Agrawal S, Sohi BS, Sharma A (2020) Image denoising review: from classical to
state-of-the-art approaches. Inf Fus 55:220–244. https://doi.org/10.1016/j.inffus.2019.09.003
123
23. Zhou Y, Fu X (2021) Image denoising based on dual-tree complex wavelet transform and convolutional
neural network. J Phys Conf Ser 1995:012030. https://doi.org/10.1088/1742-6596/1995/1/012030
24. Kingsbury NG (1998) The dual-tree complex wavelet transform: a new technique for shift invariance and
directional filters. In: IEEE digital signal processing workshop, vol. 86, pp 120–131. Citeseer
25. Cotter F (2020) Uses of complex wavelets in deep convolutional neural networks. Ph.D. thesis, University
of Cambridge. https://doi.org/10.17863/CAM.53748
26. Charbonnier P, Blanc-Feraud L, Aubert G, Barlaud M (1994) Two deterministic half-quadratic regu-
larization algorithms for computed imaging. In: Proceedings of 1st international conference on image
processing, vol. 2, pp 168–172. https://doi.org/10.1109/icip.1994.413553. IEEE
27. Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M-H, Shao L (2020) Learning enriched features for
real image restoration and enhancement. In: Computer vision–ECCV 2020: 16th European conference,
Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pp 492–511 . https://doi.org/10.1007/
978-3-030-58595-2_30. Springer
28. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pp 510–519
29. Park J, Woo S, Lee J-Y, Kweon IS (2018) Bam: Bottleneck attention module. arXiv preprint
arXiv:1807.06514
30. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the four-
teenth international conference on artificial intelligence and statistics, pp 315–323. In: JMLR workshop
and conference proceedings
31. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential
linear units (ELUS). arXiv preprint arXiv:1511.07289
32. Arjomand Bigdeli S, Zwicker M, Favaro P, Jin M (2017) Deep mean-shift priors for image restoration.
In: Advances in neural information processing systems 30
33. Franzen R (1999) Kodak lossless true color image suite. source: http://r0k. us/graphics/kodak 4(2)
34. Loshchilov I, Hutter F (2016) Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint
arXiv:1608.03983
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
123

Image Denoising Network Based On Subband Information Sharing Using Dual-Tree Complex Wavelet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Denoising Network Based On Subband Information Sharing Using Dual-Tree Complex Wavelet

Uploaded by

Copyright:

Available Formats

Neural Processing Letters

Image Denoising Network Based on Subband Information

Kui Liu1,2 · Yiping Guo1,2 · Benyue Su2,3

Accepted: 10 July 2023

Keywords Image denoising · Convolutional neural network · Dual-tree complex wavelet

• Technically, we propose an image denoising network based on subband information

3.1 Dual-Tree Complex Wavelet Transform

3.2 Overall Framework

Fig. 2 The framework of the proposed network

Fig. 3 The architecture of the

Fig. 4 The architecture of the Subband Information Sharing Unit (SISU)

Fig. 5 The architecture of the Dual Attention Unit (DAU)

3.3 Subband Information Sharing Unit

H2H Information Sharing Unit:

3.4 Selection of Activation Function

4.1 Experimental Setup

4.2 Quantitative and Qualitative Evaluations

1 The code is available at https://github.com/gyp67/Denoising_SISU_DTCWT.

Set12 DnCNN [11] 32.75/0.9042 30.23/0.8603 26.93/0.7750

Set12 DnCNN [11] 32.44/0.8977 30.13/0.8556 26.96/0.7729

CBSD68 DnCNN [11] 33.94/0.9322 31.06/0.8868 27.24/0.7853

CBSD68 DnCNN [11] 33.62/0.9283 31.12/0.8847 27.66/0.7769

Time 0.0427 0.0634 0.0416 0.0529 0.0676 0.0835 0.1525

4.3 Analysis of Computational Time

BSD68 Without SISU 31.68/0.8929 29.20/0.8288 26.27/0.7193

BSD68 DWT-introduced 31.75/0.8959 29.28/0.8331 26.35/0.7254

4.4 Ablation Studies

Effectiveness of SISU: To assess the efficacy of SISU, an experiment is conducted to compare

BSD68 Only ReLU 31.76/0.8964 29.28/0.8336 26.35/0.7253

BSD68 4 31.76/0.8968 29.28/0.8341 26.33/0.7235

You might also like