You are on page 1of 14

IET Image Processing

Review Article

State-of-art analysis of image denoising ISSN 1751-9659


Received on 7th February 2019
Revised 29th June 2019
methods using convolutional neural networks Accepted on 29th July 2019
doi: 10.1049/iet-ipr.2019.0157
www.ietdl.org

Rini Smita Thakur1 , Ram Narayan Yadav1, Lalita Gupta1


1Department of Electronics and Communication Engineering, Maulana Azad National Institute of Technology, Bhopal, MP, India
E-mail: rinithakur66@gmail.com

Abstract: Convolutional neural networks (CNNs) are deep neural networks that can be trained on large databases and show
outstanding performance on object classification, segmentation, image denoising etc. In the past few years, several image
denoising techniques have been developed to improve the quality of an image. The CNN based image denoising models have
shown improvement in denoising performance as compared to non-CNN methods like block-matching and three-dimensional
(3D) filtering, contemporary wavelet and Markov random field approaches etc. which had remained state-of-the-art for years.
This study provides a comprehensive study of state-of-the-art image denoising methods using CNN. The literature associated
with different CNNs used for image restoration like residual learning based models (DnCNN-S, DnCNN-B, IDCNN), non-locality
reinforced (NN3D), fast and flexible network (FFDNet), deep shrinkage CNN (SCNN), a model for mixed noise reduction,
denoising prior driven network (PDNN) are reviewed. DnCNN-S and PDNN remove Gaussian noise of fixed level, whereas
DnCNN-B, IDCNN, NN3D and SCNN are used for blind Gaussian denoising. FFDNet is used for spatially variant Gaussian
noise. The performance of these CNN models is analysed on BSD-68 and Set-12 datasets. PDNN shows the best result in
terms of PSNR for both BSD-68 and Set-12 datasets.

1 Introduction deconvolution etc. [17]. The emergence of CNN took place in 1962
when research on visual cortex cells of the cat was done. The
The image is corrupted by noise during its acquisition and successful implementation of LeNet 5 indicates the advantage of
transmission process. The noise is introduced into the image by an this network. The CNN has local receptive fields and translational
imperfection in the image capturing devices, from the noise invariant connection matrices. The concept of weight sharing (or
sources present in the vicinity of image capturing devices, faulty weight replication) in CNN reduces the number of free (trainable)
memory locations, lossy compression, camera imaging pipeline network parameters, so the network complexity is reduced and
(shot noise, amplifier noise and quantisation noise), scattering and generalisation is improved as compared to the artificial neural
other adverse atmospheric conditions [1]. Image denoising plays an network (ANN) [18]. CNNs are easy to train with backpropagation
important role in daily life applications such as satellite TV, as compared to other ANNs because they have sparse connectivity
magnetic resonance, computed tomography (CT), remote sensing, in each convolution layer [19]. They are widely used in the area of
astronomical applications etc. The different kinds of noise deep learning due to the availability of application-oriented large
description are given in Table 1 [2, 3]. databases and efficient parallel computing in graphics processing
Image denoising is the process of estimating latent clean image units (GPUs) [20]. CNNs provide better performance in image
from its noisy observation. Image denoising as well as closely resolution as compared to traditional sparse representation because
related operations like image impainting, blur and artefacts it possesses higher representation capability [21]. In sparse
reduction, watermark removal are also recognised as preprocessing representation, sparse dictionaries are constructed by vectorising
tasks for branches of computer vision such as image segmentation the image matrices; thus 2D structural information is lost that is the
and pattern recognition. The noise is modelled as Gaussian, dependency of pixels of local neighbouring regions. On the
Poisson, Gamma etc. based on probability distribution. It is contrary, CNN is capable of maintaining 2D structural information
characterised as white noise and colour noise based on correlation, both in the training and testing phases because the convolution
additive and multiplicative noise based on nature, quantisation and operation considers the local neighbouring image pixels by using
photon noise based on source [1]. If the image x is corrupted by 2D masks [22]. The patch-based methods such as NLM and B3MD
additive white Gaussian noise (AWGN) model, it can be require a computationally heavy iterative optimisation algorithm,
formulated as and the performance of these methods are sub-optimal if the
images have a low number of self-similarity patches [23]. On the
y= x+ϵ (1) other hand, CNN optimises its weight of the convolution masks,
through gradient-based training scheme, which inherently
where y ∈ RN denotes the observed noisy image, x ∈ RN denotes considers self-similarity in the entire set of patches available in a
the input clean image and ϵ ∼ Ɲ 0, δ2 I denotes the Gaussian noise relatively large number of training images [24]. Moreover, if CNN
vector with zero mean and covariance matrix δ2 I (I is the identity is trained once, it's weights can be learned by another network by
matrix). transfer learning [25].
The variety of image de-noising methods have been developed The image denoising performance is measured by mean square
such as non-local means (NLM) filter-based methods, wavelet- error (MSE), peak-signal-to-noise ratio (PSNR) and structural
based methods, diffusion-based methods, total variation based similarity index (SSIM), multiscale structural similarity index
methods, block-matching and three-dimensional filtering (BM3D), (MSSIM), correlation coefficient (β), weighted distance (WD)
sparse representation-based methods, Markov random field (using L1norm), normalised cross-correlation (NK), signal-to-noise
models, neural network-based methods etc. [4–16]. ratio (SNR), mean absolute error (MAE), image enhancement
The convolutional neural network (CNN) is a kind of deep factor (IEF), universal quality index (QI), average difference (AD),
neural network, which is being used in image segmentation, image maximum difference (MD), structural content (SC), correlation
superresolution, image de-blurring, image de-noising, image quality (CQ), image fidelity (IF), Laplacian MSE (LMSE), peak

IET Image Process. 1


© The Institution of Engineering and Technology 2019
Table 1 Types of noise in images Table 2 Image quality measures
Types of Description Probability density function Image quality Mathematical description
noise measures
Gaussian It occurs during film −(z − μ)2 correlation coefficient M−1 N −1
1
p(z) = σ 2n e 2σ2 , where ∑i = 0 ∑ j = 0 x i, j − x̄ i, j X y i, j − ȳ i, j
exposure and (β) M−1 N −1 2 2
development of the z = image pixel value, μ = mean, ∑i = 0 ∑ j = 0 x i, j − x̄ i, j X y i, j − ȳ i, j
image. σ = standard deviation NK M−1
∑i = 0 ∑ j = 0 xy
N −1

Impulse It occurs during data Pa for z = a M−1 N −1


∑i = 0 ∑ j = 0 x 2
(salt and transmission and the p z = Pb for z = b where a and b
MSE M−1 N −1 2
pepper image pixel value is ∑i = 0 ∑ j = 0 x i, j − y i, j
0, otherwise MXN
noise) replaced by a
are extreme image pixel values.
minimum or PSNR max x 2

maximum pixel value 10log10


MSE
of an image. MAE M−1 N −1
∑i = 0 ∑ j = 0 x i, j − y i, j
Periodic This noise is -
MXN
produced from
IEF M−1 N −1 2
electronic ∑i = 0 ∑ j = 0 n i, j − x i, j
interferences during M−1 N −1 2
∑i = 0 ∑ j = 0 y i, j − x i, j
image acquisition.
SSIM 2x̄ȳ + C1 2σxy + C2
Poisson It occurs due to the zke−z
p z = k! μx2 + μy2 + C1 σx2 + σy2 + C2
or photon statistical nature of
electromagnetic where k is the Poisson parameter. QI 4σxyx̄ȳ
waves such as X- (σx2 + σy2) + x̄2 + ȳ2
rays, visible light and AD M N
∑i = 1 ∑ j = 1 x i, j − y i, j
Gamma rays. MXN
Gamma It is found in laser- abzb − 1 − az
e z≥0 MD max x i, j − y i, j
based images. It p z = b−1 !
SC M N 2
follows Gamma 0 otherwise ∑i = 1 ∑ j = 1 y i, j
M−1 N −1 2
distribution. ∑i = 0 ∑ j = 0 x i, j
Rayleigh It is found in radar 2 −(z − a)2 M N
z−a e b CQ ∑i = 1 ∑ j = 1 x i, j y i, j
range images. It pz = b for z ≥ a M N
follows Rayleigh ∑i = 1 ∑ j = 1 x i, j
0 otherwise
distribution. IF M N
∑i = 1 ∑ j = 1 x i, j − y i, j 2

White Power spectrum — 1− M N 2


∑i = 0 ∑ j = 0 x i, j
density of white
LMSE M−1 N −1 2
noise is constant. Its ∑i = 1 ∑ j = 2 F{x i, j − F y i, j
autocorrelation is M−1 N −1
∑i = 1 ∑ j = 2 F x i, j 2

zero. M N
PMSE ∑i = 1 ∑ j = 1 x i, j − y i, j 2
Uniform It is caused by 1
if a ≤ z ≤ b M N 2
quantising the pixels p z = b−a MXN ∑i = 1 ∑ j = 1 max x i, j
of the image to a 0 otherwise NAE M N
∑i = 1 ∑ j = 1 F{x i, j − F y i, j
number of pre- where distribution is between M N
defined levels. It ∑i = 1 ∑ j = 1 F x i, j
image pixel values a and b .
follows a uniform NMSE p p
1
1 M N
distribution. M×N
∑i = 1 ∑ j = 1 x i, j − y i, j
M β Υ
MSSIM [lM x, y ]αM ∏ j = 1 C j x, y ] j s j x, y ] j]

MSE (PMSE), normalised absolute error (NAE), normalised MSE


(NMSE), and LP-norm [26–28]. The quality of a given image is
evaluated jointly by several image quality assessment approaches The exponents αM, β j, Υ j are constants which are used to adjust
in which joint models are optimised and the image indices are the relative importance of different components.
given by Spearman Rank order Correlation Coefficient (SRCC), The denoising CNN models discussed in this paper are DnCNN
Kendall Rank order Correlation Coefficient (KRCC), Pearson [30], IDCNN [24], NN3D [22], deep shrinkage CNN (SCNN) [31],
linear Correlation Coefficient (PCC) [29]. CNN for mixed noise removal [32], fast and flexible denoising
Table 2 shows some image quality measures of 2D images. In CNN (FFDNet) [23], denoising prior driven deep neural network
all of these equations, x represents original image, y represents (PDNN) [33]. The DCNN [30] uses residual learning along with
denoised image, n is the noisy image, x̄ is the mean value of the the batch normalisation, but it suffers from a problem of gradient
image, ȳ is the mean value of denoised image, M × N are image explosion and fails to converge quickly. So, the IDCNN [24]
dimensions, σx is the standard deviation of the original image, σy is overcomes this disadvantage by introducing a gradient clipping in
the network architecture of [30] which clip the individual gradients
the standard deviation of denoised image, σx2 and σy2 are variances
in the pre-defined range in order to increase the convergence speed.
of x and y, respectively, σxy is the covariance of x and y and i, j In order to improve the denoising performance CNN can be used in
are pixel indices. In the SSIM formula, C1 = K1L 2 and cascade combination with other conventional denoising methods.
C2 = K2L 2 are two variables to stabilise the division with the In NN3D [22], CNN is used in cascade combination with the non-
weak denominator, L is the dynamic range of pixel values, local filter (NLF). Some CNN models are designed for adaptive
K1 = 0.01 and K2 = 0. The function F x i, j is defined in three noise where a single model is used to remove noise with a wide
ways: x i, j or x i, j 1/3 or H{(u2 + v2 )0.5}x u, v (in cosine range of standard deviation. SCNN [31] uses soft shrinkage
transform domain). In the MSSIM formula, lM x, y denotes the activation function for adaptive noise reduction. FFDNet [23] is
luminance comparison for scale M, C j x, y is the contrast designed to handle spatially variant noise with a wide range of
noise levels (i.e. [0, 75]) effectively with a single network. In
comparison at scale j, s j x, y is the structure comparison at scale j.
PDNN [33], model-based methods which rely on the denoising
prior are merged with the learning-based CNN model, which maps

2 IET Image Process.


© The Institution of Engineering and Technology 2019
low-quality images to desirable high-quality images with deep 3 Learning for denoising operation on CNN
learning.
The remaining part of this paper is organised as follows. In The different algorithms used for training the CNN are first-order
Section 2, CNN architecture is discussed. Section 3 presents hybrid training methods, conjugate gradient methods, quasi-
learning for denoising operation on CNN. Section 4 gives the Newton methods, Levenberg–Marquardt method and its variants
architecture of different CNNs along with the dataset used for and least squares method [36]. The training algorithm uses an
denoising. Section 5 briefly describes some other CNN models objective function approach and learning function approach. In the
with their application. Section 6 gives a description of the datasets objective function approach, the minimisation of the reconstruction
used in this review. Section 7 compares the denoising performance function is the solution of a problem. In the learning function
in terms of PSNR on BSD-68 and Set 12 datasets. Section 8 approach, the solution of a regularised minimisation problem is a
concludes the paper. parametric function that is used to solve the denoising problem.
The loss function or cost function is minimised to find the optimal
parameters of the neural network. In supervised learning, this
2 CNN architecture parameter is taken as the MSE.
The CNN consists of an input layer, an output layer and a number
P M
of hidden layers of neurons which perform an alternating sequence 1
of linear and non-linear transformations. The output of each hidden Ek =
P×M ∑ ∑ (zij − tij) , 2
k = 1, …, Sk (4)
j=1i=1
layer is called a feature map. In each layer convolution kernel is
multiplied with the output of the previous layer. The input
where M is the number of output neurons, P is the number of
multiplied by the weights of the convolution kernel is further
processed by the activation functions. The lth layer output is given training patterns, Sk is the number of training iterations, zij and tij
by [24] are the actual and desired responses of the ith output neuron due to
the jth input pattern, respectively.
The stochastic gradient descent method is used for optimising
X ji = f ∑ xil −1
*wlji + blj (2) this MSE. The gradient descent method iteratively updates the
iϵM j weight w replacing w t by w t + 1 using the following update
equation: <Xml: Insert for N with `ƞ’ character in all the places>
where M j represents the selection of the input feature map , xil − 1 is
the output of previous feature map, wlji is the weight of the ∂E k
w t+1 =w t −η (5)
convolution kernel of the lth layer, f is the activation function ∂w t
which can be rectified linear unit, sigmoid function etc. and blj is where ƞ is the learning parameter.
the bias of the lth layer. In network in network model [34], the The general learning equation of CNN to solve the inverse
convolution operation is replaced by a shallow multilayer problem in imaging given in [17] as
perceptron which can represent arbitrary complex functions. After
the convolution operation and transformation with activation P
functions, the size of the feature map is reduced by pooling Rlearn = argmin

∑ C x j RΘ y j
, +g Θ (6)
operation. The output size after convolution operation is given by j=1
[35]
where a training data set of ground–truth images and their
N − K + 2P corresponding measurements (x j, y j)Pj = 1 are known. The cost
Output Size = +1 (3)
s function is denoted by C, RΘ is the function designed to extract
denoised image x j, the regularisation function is denoted by g
where N × N is the dimension of an input image, P is the padding
which promotes a solution that matches the prior knowledge of
depth, F × F is the dimension of the convolutional filter, S is stride
input image x j. Ɵ is the set of all possible parameters in CNN.
which specifies how many pixels a filter is translated horizontally
A differential and tractable loss function is required to train the
and vertically. A typical CNN architecture provided in [17] is given
network with the back-propagation algorithm. The choices of the
in Fig. 1.
loss function for solving denoising problem includes l1 norm, l2
norm and Frobenius norm [32]. The Frobenius norm is defined for
matrices, so it is used when the image is the input of CNN. The
noisy images X are fed as the input of the CNN and the
^
corresponding noise-free images X is estimated; the loss function is
given by
^
C = ∥ X − X ∥2F (7)

where ||.||F is the Frobenius norm. The weights are the elements of
the convolution filter, which are updated using the gradient of loss
function f with respect to w. The update is calculated in terms of
the weighted average of the square of the gradient which is known
as mean squared (MS) error given by [32]
2
∂C η
MS w η = ΥMS w η − 1 + 1 − Υ (8)
∂w η

λ ∂C η
w η+1 =w η − (9)
MS w η + ϵ ∂w η

In the above equations, λ λ > 0 and Υ(0 < Υ < 1) are hyper-
Fig. 1  Illustration of a typical CNN architecture for 256 × 256 pixel RGB parameters known as learning and decaying rates, respectively, ƞ is
images, including the objective function used for training the iteration number, C is the loss function given by (7) and ϵ is the
numerical stability factor.

IET Image Process. 3


© The Institution of Engineering and Technology 2019
Table 3 Some benchmark image denoising datasets and
their description
Datasets Image description
MNIST [42] 70,000 images of handwritten digits of size 28 × 
28.
RENOIR [43] Colour images corrupted by natural noise due to
low-light conditions. The dataset contains over
100 scenes and more than 400 images, including
both RAW formatted images and 8 bit BMP pixel
and intensity aligned images.
SIDD [44] Smart phone image denoising dataset of 30,000 Fig. 2  Architecture of deep DnCNN [30]
(10 scenes × 5 cameras × 4 conditions × 150
images).
DND [45] Darmstadt Noise Dataset. It consists of 50 pairs of
real noisy images and corresponding ground truth
images that were captured with consumer-grade
cameras of differing sensor sizes.
BSD-300 [46] Berkeley Segmentation Dataset consists of 300
greyscale and colour segmentation images of size
256 × 256.
Training set = 200, Test set = 100.
CIFAR [47] It has 60,000 natural images of size 32 × 32.
BRATS-2013 [48] 65 MRI of Glioma patients. Fig. 3  Architecture of IDCNN [24]
ILSVRC [20] 1.2 million high-resolution colour images of
variable size. to degrade. In residual learning, the output is noise instead of a
LSIP [49] Laser Stripe Image patch Dataset consists of 520 denoised image. The training efficiency of mini-batch stochastic
pairs of image patches, each of which is gradient method is largely reduced by an internal covariate shift,
composed of an 8-bit noisy greyscale image and i.e. changes in the distributions of internal non-linearity in inputs
an uncorrupted image. during training. Batch normalisation is used to improve the training
efficiency, which alleviates the internal covariate shift by
incorporating a normalisation step and a scale and shift step before
In the model-based denoising methods [37–41], the the nonlinearity in each layer [50]. DnCNN model in [30] is used
optimisation problem is constructed from a Bayesian perspective for blind Gaussian denoising that is it can remove Gaussian noise
by maximising the posterior probability p x y which is formulated of unknown noise level. Fig. 2 shows the architecture of DnCNN.
as This single CNN model can solve three general image
denoising tasks, i.e. blind Gaussian denoising, single image super-
x =argmax
x log P x y =argmax
x log P y x + log P x (10) resolution with multiple upscaling factors, and JPEG image
deblocking with different quality factors. A typical CNN
where log P(y x) and log P x denote data likelihood and prior architecture of deep CNN (DnCNN) is given in Fig. 2. There are
terms, respectively. Equation (10) is reframed for AWGN in which three kinds of layers in DnCNN of depth D.
log P(y x) corresponds to the l2-norm data fidelity term, and the (i) Conv + ReLU: for the first layer, 64 filters of size 3 × 3 × c
prior term P(x) characterises the prior knowledge of x in a are used to generate 64 feature maps, and rectified linear units
probability setting. So, (10) becomes (ReLU, max(0,·)) are then utilised for non-linearity. Here c
represents the number of image channels, i.e. c = 1 for grey image
and c = 3 for colour image. (ii) Conv + BN + ReLU: for layers 2
x = argmin ∥ y − Ax ∥22 + λJ x (11)
x (D − 1), 64 filters of size 3 × 3 × 64 are used, and batch
normalisation [21] is added between convolution and ReLU. (iii)
where J x denotes the regulariser associated with the prior term Conv: for the last layer, c filters of size 3 × 3 × 64 are used to
P x and the image restoration problem is given by y = Ax + n, reconstruct the output.
where y and x denote the degraded image and the original image, The training algorithm used in this deep CNN is Stochastic
respectively, A denotes the degradation matrix relating to an Gradient Descent and Adam's algorithm [51]. The averaged MSE
imaging/degradation system, and n denotes the AWGN. Then, the between the desired residual images and estimated ones from noisy
desirable solution is the one that minimises both the l2-norm data input is taken as the loss function to learn the trainable parameters
fidelity term and the regularisation term weighted by parameter λ. Ɵ.
The learning of the CNN requires training dataset of images, i.e. This CNN presented in [24] also uses residual learning without
the input and desired output images. In some datasets, training, batch normalisation. It consists of the input layer, hidden layers,
testing and validation images are pre-defined separately, whereas in and the output layer as depicted in Fig. 3. At each layer,
other datasets user can select images for training, testing and convolution operation is performed, which is actually linear
validation as per the application. The popular image datasets are filtering followed by a non-linear activation function, which is a
given in Table 3. ReLU. The ReLU activation function allows a faster and effective
training of deep neural architectures on the large and complex
dataset as compared to sigmoid function or other activation
4 Different CNN models for image denoising functions. In this network, fixed and non-fixed noise masks are
4.1 Deep CNN using residual learning used. In the fixed noise mask, noise matrix is the same for training
contaminated images with the same noise levels while in the non-
Few authors have considered image denoising is treated as a plain fixed noise mask; the noise matrix is different for each noise level.
discriminative learning problem in which noise is separated from The modelling of function R is done to predict R y = v, where
the noisy image using feed-forward CNN. In [30], residual learning
v is the residual noise. Here { yi, xi }iN= 1 represents N noisy-clean
and batch normalisation is used to speed up the training process
and boost the denoising performance. Residual learning was training image (patch) pairs. Then we have x^ = y − R y . So, the
1
initially designed to solve the performance degradation problem, loss function C Θ is given by 2 ∥ x − x^ ∥2, where Ɵ = [W, b] is the
i.e. as the length of the network increases, training accuracy begins
4 IET Image Process.
© The Institution of Engineering and Technology 2019
Table 4 List of CNN with data and noise specifications using residual learning
CNN Training Dataset Testing dataset Type of noise
DnCNN-S [30] 400 images of size 180 × 180 from BSD-500 68 images of the BSD-68 and 12 natural specific Gaussian noise level
images
DnCNN-B [30] 400 images of size 180 × 180 from BSD-500 68 images of the BSD-68 and 12 natural blind Gaussian denoising
images
CDnCNN-B [30] 432 images of size 180 × 180 from BSD-500 Remaining 68 images of the BSD-500 blind Gaussian denoising
Model_15, Model_20, 91 images from BSD-500 Cameraman, House, Lena specific Gaussian noise level
Model_25, [24]
Model_mix [24] 91 images from BSD-500 Cameraman, House, Lena blind Gaussian denoising
IDCNN-1 [24] 91 images from BSD-500 18 natural images specific Gaussian noise level
IDCNN-2 [24] 91 images from BSD-500 18 natural images blind Gaussian denoising

Fig. 5  Architecture of NN3D [22]

Fig. 4  Algorithm for DnCNN-S smoothing. The NN3D counteracts this unwanted smoothening by
proceeding iteratively. Specifically, at each iteration k input to
network parameter of CNN. The non-convex optimisation problem NN3D is a convex combination of the original noisy signal y and
is solved to minimise L Θ . The stochastic gradient descent method of the previous estimate x^ k − 1 given by
is used as learning method to optimise C Θ . CNN cannot converge
even if the learning rate is >0.01, it suffers from problem of the ȳk = λk y + 1 − λk x^ k − 1 (13)
gradient explosion. Therefore, gradient clipping is used to clip the
individual gradients to the predefined range − β, β . Gradient
In NN3D, a convex combination of input noisy image y and the
clipping scheme is applied within a given threshold [52, 53]. The
previous estimate of the clean image x^ k − 1 is fed into CNN filter
given threshold value is denoted by β and the current value of the
gradient is denoted by g. If ∥ g ∥ ≥ β, then followed by NLF, which is 3D collaborative filter. Here k denotes
the number of iterations. The NLF, i.e. B3MD requires weaker
β regularisation, i.e. smaller threshold τk as a fraction of the noise is
g= g (12) already attenuated by CNN. Rate of progress of the iterative
∥g∥
procedure is controlled by step parameter λk Therefore, both
Thus, when the current gradient value exceeds a given threshold threshold τk and step parameter λk are positive and monotonically
during training, this value is assigned to β/ ∥ g ∥ g. decreases with every iteration.
The CNN converges quickly as gradient lies in the fixed range. After the estimation of image x~k by CNN, block matching is
Some popular CNN using residual learning with data specifications done in which group of similar blocks is identified. The similar
are listed in Table 4. blocks are compiled into a look-up table of group coordinates
S = S1, …, SN . Each S j has the dimension N1 × N1 × N2, in which
4.1.1 Algorithm for DnCNN-S: The algorithm for DnCNN-S (see coordinates of N2 mutually similar blocks of size N1 × N1 is
Fig. 4). compiled. \; Each pixel in the image is at least covered by one
block of S. Non-local self-similarity is enforced by group-wise
4.2 Non-locality-reinforced CNN for image denoising processing of x~k based on the look-up table S. Specifically, at each
iteration k and for each S j ∈ S, a group g~k of size N1 × N1 × N2
j
Non-locality-reinforced CNN for image denoising is the
combination of CNN based denoiser and non-local denoiser based could be filtered by 3D transform-domain shrinkage. Shrinkage
operation is performed along the third dimension of the group g~k ,
j
on the NLF [22]. NLF exploits the mutual similarities between the
group of patches. NLF results in superior noise removal in that part i.e. 1D transform T 1D of length N2. The filtered group is given by
of the image where it exhibits strong self-similarity, such as on
edges or on regular texture. The performance of NLF is inferior on j
g^ k = T 1−1D Υ T 1D g~k , τk
j
(14)
pseudorandom textures or singular features, i.e. where the image
exhibits weak self-similarity. The image denoising methods based
on NLF are BM3D [12], NLBayes [54], and weighted nuclear where Υ is the shrinkage operator. The image estimate is obtained
j
norm minimisation (WNNM) [55]. The advantages of CNN and by returning the block estimates from the filtered group g^ k to
NLF are combined through a simple iterative modular framework, their original locations S j, where they are aggregated with group-
as CNN is biased by the learned mapping of the local features and wise weights reciprocal to the energy of the shrinkage factor.
the NLF is biased towards non-local self-similarity.
The CNN cascaded with BM3D is termed as NN3D [22] (see 4.2.1 Algorithm of NN3D: The algorithm of NN3D (see Fig. 6).
Fig. 5). In NN3D, the CNN used are for AWGN removal are
FFDNet [23], DnCNN [30], and WDnCNN [56].
At high noise levels, the local nature of CNN coupled with the 4.3 Deep SCNN for adaptive noise reduction
training on external examples, often leads to hallucination. In this Deep SCNN can reduce the time-varying noise, unlike other
case, NLF plays an important role as it can smooth out CNNs, which are designed for specific noise levels. If the noise
hallucinations that fail to meet the self-similarity prior. The level of the training data set is different from that of testing data
disadvantage of NLF is that it introduces excessive spatial set, then most of the CNNs which do not have adjustable
IET Image Process. 5
© The Institution of Engineering and Technology 2019
∂C ∂Fsh sl, i p, c , T l, c, i ∂T l, c, i
∂αl, c
= ∑ δl p c i
, , ,
∂T l, c, i ∂αl, c (20)
p

∂Ci
δl, p, c, i = (21)
∂al, i p, c

is a gradient propagated from convolution layer where al, i is the


output of the feature conversion layer. According to the definition
of soft shrinkage

−σi (sl, i p, c > αl, cσi)


∂Fsh sl, i p, c , αl, c, σi
= +σi (sl, i p, c > αl, cσi) (22)
∂T l, c, i
0 otherwise
Fig. 6  Algorithm of NN3D
In order to propagate gradient to the next layers ∂Ci /(∂sl, i(p, c)) is
parameters fail to restore images effectively. In deep SCNN [31], calculated
the soft shrinkage activation function is used. Soft shrinkage
activation function has a threshold which is adjustable to the noise ∂Ci ∂Fsh sl, i p, c , T l, c, i
= δl, p, c, i
level and is given by ∂sl, i p, c ∂sl, i p, c
(23)
∂Ci δl, p, c, i sl, i p, c > T l, c, i
x−τ (x > τ) =
Fsh x, τ = (15) ∂sl, i p, c 0 otherwise
x+τ (x < τ)
αl, c is clipped between 0 to αmax after updating. So, deep SCNN is
where x is the pixel intensity of the given co-ordinate in the image
and τ is the threshold. simultaneously optimised for various noise levels and does not
depend on specific network architecture; various network
architectures are used by adjusting activation function (see
4.3.1 Architecture of deep SCNN: SCNN architecture is the Table 5).
same as that of DnCNN [30]. It had D no. of layers and l is the
identifier of layers. The first layer is the feature extraction layer,
which consists of Nc convolution filters of size w × w and 4.3.2 Algorithm for SCNN: The algorithm for SCNN (see Fig. 7).
activation function (l = 0). The next layers from l = 1 to d − 2 is
the feature conversion layer which has Nc convolution filters of
4.4 CNN for mixed Gaussian–impulse noise reduction
size w × w × Nc, followed by a batch normalisation [50] and
activation layer. The last layer d − 1 is the residual image The estimation of mixed Gaussian–impulse noise is the challenging
generation layer, which consists of a convolution layer whose size problem due to the fact that distributions of the Gaussian and
of the filter is w × w × Nc. The loss function C Θ of SCNN is impulse noise differ significantly. In the image corrupted by the
given by AWGN, samples of a zero-mean Gaussian distribution are added to
pixels and AWGN additive components are assumed to be
N independent and identically distributed. Impulse noise results in
1
2N i∑
CΘ = ∥ R(yi; Θ) − yi − xi ∥2 (16) abrupt changes in fixed-level intensities with a given probability in
=1 a certain portion of the image. Salt and pepper impulse noise
(SPIN) and random valued impulse noise (RVIN) are widely
Here { yi, xi }iN= 1 represents the N noisy-clean training image encountered in image processing. In the case of SPIN, corrupted
(patch) pair, R represents the residual noise function and Ɵ denotes pixels take extreme values of the dynamic range of image pixels,
the learnable parameters. Fsh is the soft shrinkage activation whereas in RVIN corrupted pixels are any of the random values
function which applies to each element of the feature map sl, i p, c , within the dynamic range of image pixels. Additive filters are
where p is the position in the image and c = 0, 1, …, Nc − 1 is an generally used to remove Gaussian noise and order statistics filters
are used to remove impulse noise.
identifier of a channel, as follows: The four-layer feed-forward CNN is used to remove (i) mixed
AWGN and SPIN and (ii) mixed AWGN, SPIN and RVIN [32]. If
sl, i p, c − T l, c, i (sl, i p, c > T l, c, i)
the image is corrupted only by AWGN, then a noisy pixel is given
Fsh = sl, i p, c , T l, c, i = sl, i p, c + T l, c, i (sl, i p, c < T l, c, i) (17) by
0 otherwise
yn mv, mh = x mv, mh + v mv, mh (24)
where T l, c, i = αl, cσi is a threshold of soft shrinkage for the cth
channel of feature map given lth layer under the condition that where pixel location is denoted by (mv, mh), each element of the
standard deviation of the input image yi is σi, where αl, c is a clean image is denoted by x mv, mh and v mv, mh is a sample of
coefficient to be optimised by the gradient method as follows: i.i.d. zero-mean Gaussian distribution with standard deviation σ.
The maximum and minimum values of an image pixel are dmax and
∂C dmin within the dynamic range. If the image is corrupted by SPIN
αl, c t + 1 = αl, c t − ϵ (18) then yn mv, mh is given by either dmax and dmin with probability
∂αl, c
0.5p(p ≤ 1). If the image is corrupted by mixed AWGN + SPIN,
where t is the number of updates. According to loss function given then a noisy pixel is given by
in (16), a gradient of αl, c in (18) is given by
dmin with probablity p/2
N
∂C 1 ∂C yn mv, mh = dmax with probablity p/2 (25)
∂αl, c N i∑
= (19)
=1
∂αl, c x mv, mh + v mv, mh with probablity 1 − p

6 IET Image Process.


© The Institution of Engineering and Technology 2019
Table 5 Advantages and disadvantages of different CNN models
CNN Types of Advantages Disadvantages
model noises
DnCNN-S Gaussian • DnCNN deals with Gaussian noise of unknown noise • It fails to achieve the best results for images
[30] level, i.e. blind Gaussian denoising. which have repeated textures, e.g. House
• It can handle three general image denoising tasks, and Barbara.
i.e. blind Gaussian denoising, single image super- • It cannot be used for spatially variant
resolution with multiple upscaling factors, and JPEG Gaussian noise, which is predominant in
image deblocking with different quality factors. practical, real-world noisy images.
NN3D [22] Gaussian • NN3D framework is modular in nature as it is • NN3D model often leads to hallucinations, i.e.
cascade combination of standard pre-trained CNN the introduction of patterns that do not exist in
with standard NLF whereas other neural networks the input noisy signal, when operating at high
which embed NLFs within the layers are much more noise levels.
complicated than NN3D.
SCNN [31] Gaussian • SCNN is adjustable to the noise level of the input • At higher deviation (σ>50), it shows better
image as it uses soft shrinkage activation function results otherwise DnCNN blind denoising
whose threshold is proportional to the noise level model shows better PSNR values.
given by the user. • It does not work on real-world noisy images.
• The soft shrinkage activation function used in the
SCNN model can be used with various CNN
architectures.
CNN for mixed noise • Faster training is achieved by adopting the • Various pre-processing steps are required,
Mixed (Gaussian mechanism of transfer learning. which are specified according to noise type.
Noise [32] and impulse) • CNN for mixed noise removal has a lightweight
structure that can play a key role in many
applications where the low-complexity and robust
denoising is required.
FFDNet spatially • Performance of FFDNet in terms of the running time • Use of both noise level map M and noisy
[23] variant is superior to that of DnCNN and BM3D. image at the input increases the difficulty to
Gaussian • It removes spatially variant Gaussian noise, which is train the model.
noise predominant in real-world noisy images. • Noise level map M at the input creates a
trade-off between noise reduction and detail
preservation.
PDNN [33] Gaussian • PDNN model combines the advantages of both the • Development of an efficient algorithm for
optimisation and discriminative learning-based solving image denoising problem requires
image restoration methods. several mathematical approximations which
• It shows the best results in terms of PSNR on are solely trial and error based because this
BSD-68 dataset as compared to DnCNN, SCNN and algorithm is designed in such a way that it can
FFDNet. be easily unfolded into the CNN.

If the image is corrupted by RVIN, then noisy pixel yn mv, mh


attains a random value d mv, mh with probability r r ≤ 1 . The
value of d mv, mh is uniformly distributed within the dynamic
range [dmin, dmax]. When the image is corrupted by mixed AWGN + 
SPIN + RVIN, then a noisy pixel is given by

yn mv, mh =

dmin with probablity p/2 (26)


dmax with probablity p/2 Fig. 7  Algorithm for SCNN

d mv, mh with probablity r 1 − p layer and max-pooling layer. The four-stage CNN is used because
x mv, mh + v mv, mh with probablity 1 − p 1 − r by experiments, it has been shown that the increase in PSNR is less
than 0.5% due to the inclusion of a new convolution layer after
four [32]. Further, it is observed that increasing the number of
4.4.1 Architecture of CNN for mixed Gaussian–impulse noise
convolution layers increases the computational load significantly.
reduction: The CNN for mixed Gaussian–impulse noise reduction
In each of the four convolution stages, convolution layer may be
performs pre-processing followed by four-stage convolution
followed by a ReLU or max-pool layer. In the pre-processing step,
filtering [32]. Pre-processing includes rank order filtering and
the spatial dimensions of the noisy image are increased due to
upsampling operation. The rank-order filter is formed by a median
interpolation. So, at least one down-sampling operation using the
filter, adaptive median filter, centre weighted median filter,
max-pool layer is required in the entire CNN model.
adaptive centre weighted median filter or Cai's method [57]. After
In this CNN [32], in the first convolution stage convolution
rank-order filtering, the up-sampling operation is done using
filter is followed by ReLU and max-pool layer. The second and
bicubic interpolation. The frequency response of interpolation is
third stages consist of convolution filter and ReLU; the last stage
low pass in nature. Therefore, some high-frequency components
only consists of the convolution filter. This stage provides the
that arise from the rank-order filtering on the Gaussian noise are
estimate of a noise-free image with dimensions same as that of the
suppressed. Convolution stage consists of convolution layer, ReLU

IET Image Process. 7


© The Institution of Engineering and Technology 2019
Fig. 8  Architecture of CNN model for mixed noise [32] Fig. 10  Architecture of FFDNet [23]

Fig. 9  Algorithm of mixed noise CNN

input. The architecture of such CNN for mixed noise reduction is Fig. 11  Algorithm of FFDNet
given in Fig. 8.
The mathematical equations used at each stage are the field. It produces good denoising performance on both synthetic
following: noisy images corrupted by AWGN and real-world noisy images,
demonstrating its potential for the practical image denoising.
X1 = ROF Y 0 .
4.5.1 Architecture of FFDNet: The architecture of FFDNet is
where Y 0 is the noisy input image and ROF is the rank-order filter. shown in Fig. 10. The first layer performs down-sampling and
concatenates a tunable noise level map M with the down-sampled
X2 = BI X1 , sub-images. A reversible down-sampling operator is introduced to
reshape the input image of size W × H × c into four down-sampled
where BI is the bilinear interpolation. sub-images of size (W /2) × (H /2) × 4c. Here c is the number of
channels, i.e. c = 1 for a greyscale image and c = 3 for a colour
X3 = MPK1( max 0, W 1*X2 + b1 , image. The tensor y of size (W /2) × (H /2) × 4c + 1 is the input of
the CNN. M is a uniform map with all elements being σ for
where MPK1 denotes the max-pooling operation. spatially invariant AWGN with noise level σ. FFDNet [27] also
consists of a series of 3 × 3 convolution layers. In the first
X4 = max 0, W 2*X3 + b2 , convolution layer, ‘Conv + ReLU’ is adopted, in the middle layers
‘Conv + BN + ReLU’ is used and in the last layer, only ‘Conv’ is
X5 = max 0, W 3*X4 + b3 , (27) used. Zero padding is used to keep the size of feature maps
^
X = max 0, W 4*X5 + b4 . identical after each convolution. After the last convolution layer, an
upscaling operation is applied as the reverse operator of the down-
4.4.2 Algorithm of mixed-noise CNN: The algorithm of mixed- sampling operator applied in the input stage to produce the
noise CNN (see Fig. 9). estimated clean image x^ of size W × H × c. Dilated convolutions
are not required to further increase the receptive field as FFDNet
operates on down-sampled images.
4.5 Fast and flexible denoising CNN (FFDNet)
Fast and flexible CNN is used for spatially variant noise. FFDNet 4.5.2 Algorithm of FFDNet: The algorithm of FFDNet (see
[23] handles a wide range of noise levels ([0, 75]) effectively with Fig. 11).
a single network. It removes the spatially variant noise by
specifying a non-uniform noise level map. It is faster than 4.6 Denoising prior driven deep neural network (PDNN)
benchmark B3MD without compromising on denoising
performance discriminative learning CNN based methods [22, 24, In denoising PDNN [33], an observation model which characterises
30] are limited in flexibility and learned model is usually tailored the image degradation process is used as prior with deep CNN. So,
to a specific noise level. Though a single CNN model DnCNN-B this model exploits the powerful denoising capability of the deep
[24] is trained for Gaussian denoising but in range of noise level [0, neural network as well as it leverages the prior of the observation
55]. Moreover, DnCNN-B lacks the flexibility to deal with model. In the first step, an iterative step of denoising based image
spatially variant noise. FFDNet is used to overcome the restoration algorithm is computed efficiently. Then, this iterative
disadvantages of existing CNN based denoising methods and is process is unfolded into a deep neural network, which is composed
formulated as x = F y, M; Θ , where x is the de-noised image, y is of multiple denoisers modules interleaved with back-projection
the noisy image, M is a noise level map and Ɵ denotes model (BP) modules that ensure the observation consistencies. In some
parameters. CNN models [22, 24, 30] are formulated as earlier works also, iterative algorithms are unfolded into a deep
x = F y; Θσ , the parameters Θσ vary with the change of noise level neural network. Wang et al. developed a deep neural network based
σ. While in the FFDNet model, model parameters are invariant to on the learned iterative shrinkage/threshold algorithm (LISTA)
the noise level. FFDNet introduces noise level map in the input, [58]. The deep network is implemented by the iterative non-linear
which controls the trade-off between noise reduction and detail reaction-diffusion method [59].
preservation. It works on down-sampled images which accelerates The denoising based image restoration methods [60] do not use
the training and testing speed and enlarges the size of the receptive the explicitly expressed regulariser; instead they split the
optimisation problem of (28) into two sub-problems, one for the
8 IET Image Process.
© The Institution of Engineering and Technology 2019
Fig. 12  Architecture of denoising PDNN [33]

data likelihood term and other for the prior term. The optimisation
problem is split into two sub-problems by introducing an auxiliary
variable v with a half-quadratic splitting method. So, the non-
constrained optimisation problem is

1
x, v =argmin
x, v ∥ y − Ax ∥22 + η ∥ x − v ∥22 + λJ v (28)
2

The above optimisation problem can be solved by alternatively by


solving two sub-problems which are stated as Fig. 13  Algorithm of PDNN

4.6.2 Algorithm of PDNN: The algorithm of PDNN (see Fig. 13).


xt + 1 = argmin ∥ y − Ax ∥22 + η ∥ x − vt ∥22 (29)
x
5 Other useful CNN models
vt + 1 = argmin η ∥ xt + 1 − vt ∥22 + λJ v (30)
v The CNN based model for CT image reconstruction uses projected
gradient descent method and shows a considerable improvement in
x sub-problem is solved by the single-step gradient descent and an PSNR values over dictionary learning, total variation-based
in-exact solution is obtained as regularisation and a state-of-the-art deep learning-based direct
reconstruction techniques [61]. In remote sensing applications,
xt + 1 = xt − δ AT Axt − y + η xt − vt change detection plays a major role to sense change information
(31) about large-scale Earth surface. General end-to-end 2D CNN
= Āxt + δAT y + δvt (GETNET) provides a framework for hyperspectral change
detection [62]. Multi-scale CNN (MS-CNN) is used for the
where Ā = 1 − δη I − δAT A and δ is the step size controlling hyperspectral image classification [63]. Deep metrics are usually
parameter. The v-sub-problem is a proximity operator of regulariser used with MS-CNN to improve representation ability of
J v computed at a point xt + 1, whose solution is obtained by a hyperspectral image. Siamesed fully-convolutional network is used
for road detection, it considers RGB-channel images, semantic
denoising function, i.e. vt + 1 = f (xt + 1), where f . denotes a contours, and location simultaneously to segment the road region
denoising function. A deep neural network is unfolded with the elaborately and correctly [64].
algorithm given by (31) to reduce computational complexity and
number of iterations. This iterative denoising method has shown
improvement over conventional denoising method, which performs 6 Datasets used
denoising operation only once. For the comparative analysis of performances of the various CNN
based image denoising methods, we have used BSD-68 and Set-12
4.6.1 Architecture of denoising PDNN: The initial estimate x 0 datasets (see Figs. 14–16).
is obtained when the input degraded image y is passed through a Berkeley Segmentation Dataset (BSD-500, BSD-300, BSD-68)
linear layer parameterised by degradation matrix A. The next linear is developed by Computer Vision Group of Berkeley University of
layer is parameterised by Ā. The output of the second layer is California. BSD-300 dataset consists of 300 images (200 images
added with x 0 weighted by δ1. The updated signal is fed into deep for training and 100 images for testing) which have been collected
from 12,000 hand-labelled segmentations from Coral dataset
CNN and denoised signal v1 is obtained. The denoised signal v 1 is images from 30 subjects. BSD-500 dataset is extended version of
fed into the linear layer parameterised by Ā, whose output is BSD-300 dataset which consists of 200 more test images. BSD-68
further added with x 0 and v 1 via two shortcut connections for the dataset is derived from BSD-300 dataset, which consists of 68
updated x 2 . This process is repeated K times (see Fig. 12). images from a separate test set of BSD-300 [65]. Initially, it was
DCNN module consists of feature encoding and decoding parts. used for image segmentation and boundary detection, but now it
The feature encoding part consists of four feature extraction blocks used for image denoising too.
with convolution and pooling layers. Each feature extraction block The set-12 dataset is the collection of 12 widely used test
consists of convolution layers with 3 × 3 convolution kernel images that are Cameraman, House, Peppers, Starfish, Monarch,
followed by ReLU non-linearity and pooling layers. The first three Airplane, Parrot, Lena, Barbara, Boat, Man, Couple and Average.
layers produce a 64-channel feature map and the last layer These images are widely used in image processing applications.
produces a 128-channel feature map. The feature decoding part is
also grouped into four blocks and consists of a series of 7 Performance analysis of CNN models for image
convolution layers followed by up-sampling layer to increase the
spatial resolution of the feature maps. In each reconstruction block
denoising
first three layers of 3 × 3 convolution kernel produce 128-channels The advantages and disadvantages of different CNN models are
feature maps and the fourth layer generates 512-channels feature given in Table 5. The comparison of denoising performance of
maps. The feature maps are up-sampled by the factor of two to different CNN models is possible when a testing dataset is identical
increase the spatial resolution. These feature maps of decoding part for all. In most of the recent works [22, 23, 30, 31, 33], Berkeley
are fused with the feature maps from encoding part with same Segmentation Dataset (BSD-68) is used as a benchmark dataset for
spatial resolution. testing. FFDNet, DnCNN-B and SCNN are the single neural
networks trained for a wide range of noise levels. Tables 6–8

IET Image Process. 9


© The Institution of Engineering and Technology 2019
Fig. 14  Denoising performance (PSNR in dB) for house image with noise level 50
(a) Original image, (b) BM3D(29.69 dB), (c) WNNM (30.33 dB), (d) TNRD (29.48 dB), (e) DnCNN-S (30.02 dB), (f) PDNN (31.04 dB)

compare the performance of different CNN models and popular image sizes. The fastest running time performance is achieved by
state-of-art-methods like BM3D [9], WNNM [55], expected patch FFDNet while running on GPU.
log-likelihood (EPLL) [66], multi-layer perceptron MLP [67],
cascade of shrinkage fields (CSF) [68] and trainable non-linear 8 Real-life applications
reaction-diffusion (TNRD) [59] on Set 12 and BSD-68 datasets,
respectively. FFDNet is slightly inferior to DnCNN-S and DnCNN- CNNs can be applied in various real-life image denoising
B when the noise level is low (e.g. σ ≤ 25) but gradually applications including bio-medical images, remote sensing images,
outperforms DnCNN-S and DnCNN-B with the increase in noise hazy images, hyper-spectral images, encrypted images for data
level (e.g. σ > 25). This is because of the trade-off between the security etc. In medical imaging, machine learning techniques are
receptive field size and modelling capacity. FFDNet has a larger used as pre-processing step to remove noise from ultrasound
receptive field than DnCNN, thus favouring for removing strong images, magnetic resonance images (MRI) and computer aided
noise, while DnCNN has better modelling capacity, which is diagnosis of breast cancer [69]. The CT and MRI are corrupted by
beneficial for denoising images with the lower noise level. SCNN Gaussian noise during image acquisition and transmission such as
is trained as same that of DnCNN-B, but it uses soft shrinkage sensor noise caused by low light, high temperature and electron
activation function with αmax = 3. The use of soft shrinkage circuit noise and can be cleaned using various machine learning
activation function reduces the computational cost as compared to techniques [70]. Residual learning is used as a part of natural
DnCNN with about 0.2–0.5 dB degradation in PSNR values. image dehazing and it can help in smooth operations of all kinds of
It is clear from Tables 6 and 7 that PDNN outperforms all the traffics in foggy season. The DnCNN, NN3D, SCNN and PDNN
methods for both BSD-68 and Set-12 datasets images at a noise removes Gaussian noise and they can be used on RESIDE
level (σ = 25, 50). In other CNN models, the observation model dehazing datasets [71], mini-MIAS database of mammograms,
which characterises image degradation process is not considered, dental radiography database etc. FFDNet model deals with
whereas in PDNN observation model is framed into an iterative spatially variant Gaussian noise which occurs during JPEG image
denoising algorithm which is unfolded into DnCNN with back- compression. FFDNet is the fastest model in terms of
propagation modules. Therefore, PDNN provides the powerful implementation time, therefore it can be used in real-time
denoising capability of both CNN and observation models. applications like surveillance where live images are being
From Table 8, it is observed that NN3D outperforms FFDNet monitored for security purpose. Mixed noise occurs when there are
and DnCNN-S as it incorporates the advantages of non-local group faulty sensors due to temperature fluctuations or during
wise filtering in iterative modular framework along with CNN. The transmission process. The CNN model [32] for mixed noise
NN3D outperforms the state-of-art-methods for stronger noise. The reduction can be used for remote sensing images including real
datasets which have strong self-similarity, such as ‘Urban-100’ desert seismic record which are prone to mixed noise as camera
produce maximal gain. In NN3D, out of DnCNN, FFDNet and sensors are placed in harsh geographical conditions. Moreover, real
WDnCNN, the best denoising performance is shown by WDnCNN world noisy datasets, e.g. RENOIR, NAM, DND and Xu can be
when it is used as CNN. used on all models for practical applications of image denoising
Table 9 shows the running time in seconds of different methods [72].
on single threaded CPU, multi-threaded CPU and GPU on different

10 IET Image Process.


© The Institution of Engineering and Technology 2019
Fig. 15  Denoising performance for Lena image with noise level 50. The PSNR results
(a) Original image, (b) BM3D (29.05 dB), (c) WNNM (29.25 dB), (d) TNRD (28.93 dB), (e) DnCNN-S (29.37 dB), (f) PDNN (29.85 dB)

Fig. 16  Denoising performance for one image of BSD-68 dataset with noise level 50. The PSNR results
(a) Noisy image, (b) BM3D (26.21 dB), (c) WNNM (26.51 dB), (d) MLP (26.54 dB), (e) TNRD (26.59 dB), (f) DnCNN-S (26.90 dB)

IET Image Process. 11


© The Institution of Engineering and Technology 2019
Table 6 Average PSNR (dB) results of different CNN based denoising methods on Set-12 Dataset
Images methods C.Man House Peppers Starfish Monarch Airplane Parrot Lena Barbara Boat Man Couple Average
Gaussian noise level σ = 15
BM3D [9] 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10 32.37
WNNM [55] 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17 32.70
EPLL [66] 31.85 34.17 32.64 31.13 32.10 31.19 31.42 33.92 31.38 31.93 32.00 31.93 32.14
CSF [68] 31.95 34.39 32.85 31.55 32.33 31.33 31.37 34.06 31.92 32.01 32.08 31.98 32.32
TNRD [59] 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN [30] 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
IDCNN1 [24] 32.54 34.87 33.24 — 35.49 32.79 — 33.75 33.15 31.81 — — —
IDCNN2 [24] 32.24 34.83 33.11 — 35.38 32.68 — 33.70 32.98 31.73 — — —
FFDNet [23] 32.42 35.01 33.10 32.02 32.77 31.58 31.77 34.63 32.50 32.35 32.40 32.45 32.75
PDNN [33] 32.44 35.40 33.19 32.08 33.33 31.78 31.48 34.80 32.84 32.55 32.53 32.51 32.91
Gaussian noise level σ = 25
BM3D [9] 29.45 32.85 30.16 28.59 29.25 28.42 28.93 32.07 30.71 29.90 29.61 29.71 29.97
WNNM [55] 29.64 33.22 30.42 29.03 29.84 28.69 29.15 32.24 31.24 30.03 29.76 29.82 30.26
EPLL [66] 29.26 32.17 30.17 28.51 29.39 28.61 28.95 31.73 28.61 29.74 29.66 29.53 29.69
CSF [68] 29.48 32.39 30.32 28.80 29.62 28.72 28.90 31.79 29.03 29.76 29.71 29.53 29.83
TNRD [59] 29.72 32.53 30.57 29.02 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71 30.06
DnCNN [30] 30.18 33.06 30.87 29.41 30.28 29.13 29.43 32.44 30.00 30.21 30.10 30.12 30.43
IDCNN1 [24] 30.06 32.94 30.79 — 32.85 30.06 — 31.13 30.45 29.18 — — —
IDCNN2 [24] 29.98 32.87 30.71 — 32.77 30.00 — 31.08 30.18 29.12 — — —
FFDNet [23] 30.06 33.27 30.79 29.33 30.14 29.05 29.43 32.59 29.98 30.23 30.10 30.18 30.43
PDNN [33] 30.12 33.54 30.90 29.43 30.31 29.14 29.28 32.69 30.30 30.34 30.15 30.24 30.54
Gaussian noise level σ = 50
BM3D [9] 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46 26.72
WNNM [55] 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.94 26.64 27.05
EPLL [66] 26.10 29.12 26.80 25.12 25.94 25.31 25.95 28.68 24.83 26.74 26.79 26.30 26.47
TNRD [59] 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50 26.81
DnCNN [30] 27.03 30.00 27.32 25.70 26.78 25.87 26.48 29.39 26.22 27.20 27.24 26.90 27.18
IDCNN1 [24] 26.80 29.89 27.22 — 29.23 26.52 — 27.69 26.50 25.87 — — —
IDCNN2 [24] 26.82 29.76 27.20 — 29.18 26.45 — 27.68 26.51 25.87 — — —
FFDNet [23] 27.03 30.43 27.43 25.77 26.88 25.90 26.58 29.68 26.48 27.32 27.30 27.07 27.32
PDNN [33] 27.12 31.04 27.44 25.95 27.00 25.97 26.42 29.85 27.21 27.42 27.32 27.23 27.50

Table 7 Average PSNR (dB) of different CNN based denoising methods on BSD-68 dataset
Gaussian noise BM3D WNNNM EPLL MLP CSF TNRD DnCNN-S DnCNN-B SCNN FFDNet PDNN
level [9] [55] [66] [67] [68] [59] [30] [30] [31] [23] [33]
σ = 15 31.07 31.37 31.21 — 31.24 31.42 31.73 31.61 31.48 31.63 32.29
σ = 25 28.57 28.83 28.68 28.96 28.74 28.92 29.23 29.16 29.03 29.19 29.88
σ = 50 25.62 25.87 25.67 26.03 — 25.97 26.23 26.23 26.08 26.29 —

Table 8 Average PSNR (dB) of different CNN based denoising methods on BSD-68 dataset
Gaussian BM3D WNNNM MLP TNRD DnCNN-S FFDNet WDnCNN NN3D NN3D NN3D
noise level [9] [55] [67] [59] [30] [23] [50] (DnCNN) (FFDNet) (WDnCNN)
[22] [22] [22]
σ = 30 27.75 — — — 28.36 28.38 28.56 28.41 28.37 28.56
σ = 50 25.63 25.87 26.03 25.97 26.23 26.29 26.39 26.27 26.29 26.42
σ = 75 24.22 24.40 24.59 — 24.64 24.79 24.85 24.71 24.80 24.91

9 Conclusions DnCNN-S, SCNN, FFDNet, NN3D, PDNN are more useful in


denoising synthetic image corrupted manually by users. However,
In this paper, different CNN models for image denoising are these models can be modified or can be combined with different
presented. The architecture of CNN models with application image dehazing techniques for denoising naturally corrupted
oriented noise specifications is explained. The performance of each images. CNN can also be used with Weiner filter, bilateral filter
of the CNN noise models and other state-of-art methods are and fuzzy based filters like switching filter, gradient filter,
compared in terms of PSNR on BSD-68 and Set-12. It has been similarity based filter etc. This survey allows researchers to
observed that CNN in combination with denoising filters and other ascertain which CNN denoising model they could use and which
iterative optimisation method improves the denoising performance. metrics they would contemplate for their research as a pre-
PDNN shows best results among all the networks for both BSD-68 processing step when dealing with noisy images.
and Set-12 as it unfolds the observation model of image
degradation into the CNN with iterative optimisation algorithm and
back-propagation modules. FFDNet model is the fastest in terms of
average running time while running on GPU. It is observed that

12 IET Image Process.


© The Institution of Engineering and Technology 2019
Table 9 Running Time in seconds of different methods for denoising images with different sizes
Methods Device 256 × 256 512 × 512 1024 × 1024
Grey Colour Grey Colour Grey Colour
BM3D [9] CPU(ST) 0.59 0.98 2.52 3.57 10.77 20.15
DnCNN-S [30] CPU(ST) 2.14 2.44 8.63 9.85 32.82 38.11
CPU(MT) 0.74 0.98 3.41 4.10 12.10 15.48
GPU 0.011 0.014 0.033 0.040 0.124 0.167
FFDNet [23] CPU(ST) 0.44 0.62 1.81 2.51 7.24 10.17
CPU(MT) 0.18 0.21 0.73 0.98 2.96 3.95
GPU 0.006 0.008 0.012 0.017 0.038 0.057
NN3D [22] using [23] CPU + GPU 0.27 — — — — —
NN3D [22] using [30] CPU + GPU 0.25 — — — — —
NN3D [22] using [56] CPU + GPU 0.63

10 References [27] Wang, Z., Bovik, A.: ‘A universal image quality index’, IEEE Signal Process.
Lett., 2002, 9, (3), pp. 81–84
[1] Sridhar, S.: ‘Digital image processing’ (Oxford Publications, New Delhi, [28] Wang, Z., Simoncelli, E., Bovik, A.: ‘Multiscale structural similarity for
India, 2016, 2nd edn.), pp. 1–7 image quality assessment’. The Thirty-Seventh Asilomar Conf. on Signals,
[2] Boyat, A., Joshi, B.: ‘A review paper: noise models in digital image Systems & Computers, Pacific Grove, CA, USA, 2003, pp. 1398–1402
processing’, Signal Image Process., Int. J., 2015, 6, (2), pp. 63–75 [29] Oszust, M.: ‘Full-reference image quality assessment with linear combination
[3] Sontakke, M., Kulkarni, M.: ‘Different types of noises in images and noise of genetically selected quality measures’, PLoS ONE, 2016, 11, (6), pp. 1–17
removing technique’, Int. J. Adv. Technol. Eng. Sci., 2015, 3, (1), pp. 102–115 [30] Zhang, K., Zuo, W., Chen, Y., et al.: ‘Beyond a Gaussian denoiser: residual
[4] Tomasi, C., Manduchi, R.: ‘Bilateral filtering for gray and color images’. learning of deep CNN for image denoising’, IEEE Trans. Image Process.,
IEEE 6th Int. Conf. on Computer Vision, Mumbai, India, 1998, pp. 839–846 2017, 26, (7), pp. 3142–3155
[5] Michael, E.: ‘On the origin of the bilateral filter and ways to improve it’, [31] Isogawa, K., Ida, T., Shiodera, T., et al.: ‘Deep shrinkage convolutional neural
IEEE Trans. Image Process., 2002, 11, (10), pp. 1141–1151 network for adaptive noise reduction’, IEEE Signal Process. Lett., 2018, 25,
[6] Perona, P., Malik, J.: ‘Scale-space and edge detection using anisotropic (2), pp. 224–228
diffusion’, IEEE Trans. Pattern Anal. Mach. Intell., 1990, 12, (7), pp. 629– [32] Islam, M., Rahman, S., Ahmad, M., et al.: ‘Mixed Gaussian-impulse noise
639 reduction from images using convolutional neural network’, Signal Process.,
[7] Buades, A., Bartomeu, C., Morel, J., et al.: ‘A review of image denoising Image Commun., 2018, 68, pp. 26–41
algorithms with a new one’, Multiscale. Model. Simul., 2005, 4, (2), pp. 490– [33] Dong, W., Wang, P., Yin, W., et al.: ‘Denoising prior driven deep neural
530 network for image restoration’, IEEE Trans. Pattern Anal. Mach. Intell.,
[8] Awate, P., Whitaker, R.: ‘Unsupervised, information theoretic, adaptive image 2018, doi: 10.1109/TPAMI.2018.2873610
filtering for restoration’, IEEE Trans. Pattern Anal. Mach. Intell., 2006, 41, [34] Pang, Y., Sun, M., Jiang, X., et al.: ‘Convolution in convolution for network
(10), pp. 2305–2318 in network’, IEEE Trans. Neural Netw. Learn. Syst., 2018, 29, (5), pp. 1587–
[9] Kostadin, D., Foi, A., Katkovnik, V., et al.: ‘Image denoising by sparse 3-D 1597
transform-domain collaborative filtering’, IEEE Trans. Image Process., 2007, [35] Murphy, J.: ‘An overview of convolution neural network architectures for
16, (8), pp. 2080–2095 deep learning’ (Microway Inc., Fall, Plymouth, USA, 2016)
[10] Milanfar, P.: ‘A tour of modern image filtering: new insights and methods, [36] Tivive, F., Bouzerdoum, A.: ‘Efficient training algorithms for a class of
both practical and theoretical’, IEEE Signal Process. Mag., 2012, 30, (1), pp. shunting inhibitory convolutional neural networks’, IEEE Trans. Neural
106–128 Netw., 2005, 16, (3), pp. 541–556
[11] Barash, D.: ‘Fundamental relationship between bilateral filtering, adaptive [37] Elad, M., Aharon, M.: ‘Image denoising via sparse and redundant
smoothing, and the nonlinear diffusion equation’, IEEE Trans. Pattern Anal. representation over learned dictionaries’, IEEE Trans. Image Process., 2006,
Mach. Intell., 2002, 24, (6), pp. 844–847 15, (12), pp. 3736–3745
[12] Danielyan, A., Katkovnik, V., Egiazarian, K.: ‘BM3D frames and variational [38] Dong, W., Zhang, L., Shi, G., et al.: ‘Nonlocally centralized sparse
image deblurring’, IEEE Trans. Image Process., 2012, 21, (4), pp. 1715–1728 representation for image restoration’, IEEE Trans. Image Process., 2013, 22,
[13] Dautov, C., Ozerdem, M.: ‘Wavelet transform and signal denoising using (4), pp. 1620–1630
wavelet method’. 26th Signal Processing and Communications Applications [39] Dong, W., Shi, G., Ma, Y., et al.: ‘Image restoration via simultaneous sparse
Conf. (SIU), Izmir, 2018, pp. 1–4 coding: where structured sparsity meets Gaussian scale mixture’, Int. J.
[14] Zhang, M., Desrosiers, C.: ‘Image denoising based on sparse representation Comput. Vis., 2015, 114, (2-3), pp. 217–232
and gradient histogram’, IET Image Process., 2017, 11, (1), pp. 54–63 [40] Osher, S., Burger, M., Goldfarb, D., et al.: ‘An iterative regularization method
[15] Li, M.: ‘An improved non-local filter for image denoising’. Int. Conf. on for total variation-based image restoration’, Multiscale Model. Simul., 2005,
Information Engineering and Computer Science, Wuhan, 2009, pp. 1–4 4, (2), pp. 460–489
[16] Malfait, M., Roose, D.: ‘Wavelet-based image denoising using a Markov [41] Yu, G., Sapiro, G., Mallat, S.: ‘Solving inverse problems with piecewise
random field a priori model’, IEEE Trans. Image Process., 1997, 6, (4), pp. linear estimators: from Gaussian mixture models to structured sparsity’, IEEE
549–565 Trans. Image Process., 2012, 21, (5), pp. 2481–2499
[17] McCann, M., Jin, K., Unser, M.: ‘Convolutional neural networks for inverse [42] LeCun, Y., Jackel, L., Bottou, L., et al.: ‘Learning algorithms for
problems in imaging: a review’, IEEE Signal Process. Mag., 2017, 34, (6), classification: a comparison on handwritten digit recognition’. Neural
pp. 85–95 networks: the statistical mechanics perspective, 1995, pp. 261–276
[18] Haykin, S.: ‘Neural networks: a comprehensive foundation’ (Prentice-Hall, [43] Anaya, J., Barbu, A.: ‘RENOIR–a dataset for real low-light image noise
Singapore, 1999, 2nd edn.) reduction’, J. Visual Commun. Image Represent., 2018, 51, (2), pp. 144–154
[19] Bengio, Y.: ‘Learning deep architectures for AI’, Found. Trends Mach. [44] Abdelhamed, A., Lin, S., Brown, M.: ‘A high-quality denoising dataset for
Learn., 2009, 2, (1), pp. 127–131 smartphone cameras’. IEEE/CVF Conf. on Computer Vision and Pattern
[20] Krizhevsky, A., Sutskever, I., Hinton, G.: ‘Image net classification with deep Recognition, Salt Lake City, UT, USA, 2018, pp. 1692–1700
convolutional neural networks’. Proc. of Int. Conf. of Neural Information [45] Plötz, T., Roth, S.: ‘Benchmarking denoising algorithms with real
Processing Systems, LakeTahoe, NV, 2012, pp. 1097–1105 photographs’. 2017 IEEE Conf. on Computer Vision and Pattern Recognition
[21] Dong, C., Loy, C., He, K., et al.: ‘Image super-resolution using deep (CVPR), Honolulu, HI, 2017, pp. 2750–2759
convolutional networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, [46] Martin, D., Fowlkes, C., Tal, D., et al.: ‘A database of human segmented
(2), pp. 295–307 natural images and its application to evaluating segmentation algorithms and
[22] Cruz, C., Foi, A., Katkovnik, V., et al.: ‘Nonlocality-reinforced convolutional measuring ecological statistics’. Proc. Eighth IEEE Int. Conf. on Computer
neural networks for image denoising’, IEEE Signal Process. Lett., 2018, 25, Vision, Vancouver, Canada, 2001, pp. 416–423
(8), pp. 1216–1220 [47] Krizhevsky, A.: ‘Learning multiple layers of features from tiny images’. PhD
[23] Zhang, K., Zuo, W., Zhang, L.: ‘FFDNet: toward a fast and flexible solution thesis, University of Toronto, 2012
for CNN based image denoising’, IEEE Trans. Image Process., 2018, 27, (9), [48] Menze, B., Jakab, A., Bauer, S., et al.: ‘The multimodal brain tumor image
pp. 4608–4622 segmentation benchmark (BRATS)’, IEEE Trans. Med. Imaging, 2015, 34,
[24] Zhang, F., Cai, N., Wu, J., et al.: ‘Image denoising method based on a deep (10), pp. 1993–2024
convolution neural network’, IET Image Process., 2018, 12, (4), pp. 485–493 [49] Fanga, Z., Jiaa, T., Chena, Q., et al.: ‘Laser stripe image denoising using
[25] Yosinski, J., Clune, J., Bengio, Y., et al.: ‘How transferable are features in convolutional autoencoder’, Results in Phys., 2018, 11, pp. 96–104
deep neural networks?’. Proc. Advances in Neural Information Processing [50] Ioffe, S., Szegedy, C.: ‘Batch normalization: accelerating deep network
Systems, Montréal, Canada, 2014, pp. 3320–3328 training by reducing internal covariate shift’. Proc. of the 32nd Int. Conf. on
[26] Mafi, M., Martin, H., Cabrerizo, M., et al.: ‘A comprehensive survey on Machine Learning, Lille, France, 2015, pp. 448–456
impulse and Gaussian denoising filters for digital images’, Signal Process., [51] Kingma, D.P., Ba, J.L.: ‘Adam: a method for stochastic optimization’. 3rd Int.
2018, 157, pp. 236–260 Conf. for Learning Representations, San-Diego, USA, 2015, pp. 1–15

IET Image Process. 13


© The Institution of Engineering and Technology 2019
[52] Razvan, P., Tomas, M., Yoshua, B.: ‘Understanding the exploding gradient [63] Gong, Z., Zhong, P., Yu, Y., et al.: ‘A CNN with multiscale convolution and
problem’. Tech. Rep., Université De Montréal, 2012, arXiv:1211.5063 diversified metric for hyperspectral image classification’, IEEE Trans.
[53] Tomas, M.: ‘Statistical language models based on neural networks’. PhD Geosci. Remote Sens., 2019, 57, (6), pp. 3599–3618
thesis, Brno University of Technology, 2012 [64] Wang, Q., Gao, J., Yuan, Y.: ‘Embedding structured contour and location prior
[54] Lebrun, M., Buades, A., Morel, J.: ‘A nonlocal Bayesian image denoising in siamesed fully convolutional networks for road detection’, IEEE Trans.
algorithm’, SIAM J. Imaging Sci., 2013, 6, (3), pp. 1665–1668 Intell. Transp. Syst., 2018, 19, (1), pp. 230–241
[55] Gu, S., Xie, Q., Meng, D., et al.: ‘Weighted nuclear norm minimization and [65] Roth, S., Black, M.: ‘Fields of experts: a framework for learning image
its applications to low level vision’, Int. J. Comput. Vis., 2017, 121, (2), pp. priors’, IEEE Computer Society Conf. on Computer Vision and Pattern
183–208 Recognition, 2005, vol. 2, pp. 860–867
[56] Bae, W., Yoo J, J., Ye, J.C.: ‘Beyond deep residual learning for image [66] Zoran, D., Weiss, Y.: ‘From learning models of natural image patches to
restoration: persistent homology-guided manifold simplification’. IEEE Conf. whole image restoration’. IEEE Int. Conf. on Computer Vision, Barcelona,
on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, 2017, Spain, November 2011, pp. 479–486
pp. 1141–1149 [67] Burger, H., Schuler, C., Harmeling, S.: ‘Image denoising: can plain neural
[57] Salembier, P., Kunt, M.: ‘Size-sensitive multiresolution decomposition of networks compete with BM3D?’. IEEE Conf. on Computer Vision and
images with rank order based filters’, Signal Process., 1992, 27, (2), pp. 205– Pattern Recognition, Providence, RI, June 2012, pp. 2392–2399
241 [68] Schmidt, U., Roth, S.: ‘Shrinkage fields for effective image restoration’. IEEE
[58] Gregor, K., LeCun, Y.: ‘Learning fast approximations of sparse coding’. Proc. Conf. on Computer Vision and Pattern Recognition, Columbus, USA, June
of Int. Conf. of Machine Learning, Haifa, Israel, 2010, pp. 399–406 2014, pp. 2774–2781
[59] Chen, Y., Pock, T.: ‘Trainable nonlinear reaction diffusion: a flexible [69] Kaur, P., Singh, G., Kaur, P.: ‘A review of denoising medical images using
framework for fast and effective image restoration’, IEEE Trans. Pattern machine learning’, Curr. Med. Imaging. Rev., 2018, 14, pp. 675–685
Anal. Mach. Intell., 2017, 39, (6), pp. 1256–1272 [70] Ali, H.: ‘MRI medical image denoising by fundamental filters’ in High-
[60] Venkatakrishnan, S., Bouman, C., Chu, E., et al.: ‘Plug-and play priors for Resolution Neuroimaging - Basic Physical Principles and Clinical
model based reconstruction’. Proc. of IEEE Global Conf. on Signal and Applications, (Intechopen, 2018), pp. 111–124
Information Processing, Austin, USA, 2013, pp. 945–948 [71] Li, B., Ren, W., Fu, D., et al.: ‘Benchmarking single image dehazing and
[61] Gupta, H., Jin, K., Nguyen, H., et al.: ‘CNN-based projected gradient descent beyond’, J. Latex Class Files, 2015, 14, (8), pp. 1–13
for consistent CT image reconstruction’, IEEE Trans. Med. Imaging, 2018, [72] Kong, Z., Yang, X.: ‘A brief review of real-world color image denoising’,
37, (6), pp. 1440–1453 September 2018, arXiv:1809.03298v1
[62] Wang, Q., Yuan, Z., Du, Q., et al.: ‘GETNET: a general end-to-end 2-D CNN
framework for hyperspectral image change detection’, IEEE Trans. Geosci.
Remote Sens., 2019, 57, (1), pp. 3–13

14 IET Image Process.


© The Institution of Engineering and Technology 2019

View publication stats

You might also like