2022 Gurrola Dalmau Alarcon Unet Fringe Denoising

Optics and Lasers in Engineering 149 (2022) 106829
Contents lists available at ScienceDirect
Optics and Lasers in Engineering

journal homepage: www.elsevier.com/locate/optlaseng
U-Net based neural network for fringe pattern denoising

Javier Gurrola-Ramos a, Oscar Dalmau a,∗, Teresa Alarcón b
a
Mathematics Research Center, Guanajuato, Guanajuato, Mexico A.C., Jalisco S/N, Col. Valenciana CP: 36023, Mexico
b
Department of Computer Science and Engineering, Centro Universitario de los Valles, Carretera Ameca-Guadalajara Km. 45.5, CP :46600, Ameca, Jalisco, Mexico
a r t i c l e i n f o a b s t r a c t
Keywords: Fringe patterns from different optical measurement systems are widely used in scientific and engineering appli-
Fringe patterns cations. However, fringe patterns are often corrupted by speckle noise, which is necessary to be removed to accu-
Speckle noise rately recover the information encoded in the phase of the fringe pattern. In this paper we propose a lightweight
Neural network
residual dense neural network based on the U-net neural network model (LRDUNet) for fringe pattern denoising.
The encoding and decoding layers of the LRDUNet consist of grouped densely connected convolutional layers for
the sake of reusing the feature maps and reducing the number of trainable parameters. Additionally, local resid-
ual learning is used to avoid the vanishing gradient problem and speed up the learning process. We compare the
proposed method versus state-of-the-art methods and present a study of parameters where we demonstrate that
computationally simpler versions of the proposed model are still quite competitive. Experiments on simulated
and real fringe patterns show that the proposed method outperforms state-of-the-art methods by restoring the
main features of the fringe patterns, achieving an average of 41 dB of PSNR on simulated images.
1. Introduction noising method by thresholding the WFT coefficients to remove spectral

contribution of the noise. Tang et al. [7] proposed a method based on
Optical interferometric techniques have been used in scientific re- second-order oriented partial differential equations that make the diffu-
search and engineering as important methods for non-contact measure- sion along the fringe orientation. Wang et al. [8] proposed an applica-
ments [1]. Optical interferometry outputs a fringe pattern or a sequence tion of coherence-enhancing diffusion (CED) to fringe pattern denoising.
of fringe patterns. In general, the information of a measured object is en- This method smooths a fringe pattern along directions both parallel and
coded in the phase of the fringe pattern. A fringe pattern can be modeled perpendicular to fringe orientation with suitable diffusion speeds.
as In recent years, deep learning techniques [9] have improved the
state-of-the-art methods of visual object detection [10] and image clas-
𝑓 (𝒑; 𝑡) = 𝑎(𝒑; 𝑡) + 𝑏(𝒑; 𝑡) cos [𝜙(𝒑; 𝑡)] + 𝑛(𝒑; 𝑡) (1)
sification [11]. In [12], Ronneberger et al. proposed a convolutional
where 𝒑 and 𝑡 represent the spatial and temporal coordinates, 𝑎(𝒑; 𝑡), network for biomedical image segmentation (U-net model). Although
𝑏(𝒑; 𝑡), 𝜙(𝒑; 𝑡) and 𝑛(𝒑; 𝑡) are the background intensity, the fringe ampli- the U-net model was initially designed for segmentation tasks in medi-
tude, the phase distribution, and the noise respectively [2]. Eq. (1) cor- cal imaging, it is now widely used for general image segmentation and
responds to the general mathematical model of the optical technique has spread to other applications [13]. In the field of optical interferome-
called electric speckle pattern interferometry (ESPI) [3]. Fringe pattern try, the deep learning techniques have been used for wavefront sensing
images are typically corrupted by multiplicative and additive noise. Mul- [14], phase unwrapping [15].
tiplicative noise is introduced by speckles, while additive noise is pro- Deep learning techniques have also been successfully applied to
duced by electronics and environmental noise [2,4]. Therefore, remov- image denoising problems. Zuo et al. [16] presented a convolutional
ing or reducing the noise is preferable before applying other processes neural network for image denoising and restoration. Other natural im-
to analyze fringe patterns, such as phase recovery [5] or phase shift age denoising models are the denoising convolutional neural network
estimation [6]. (DnCNN) [17] and the fast and flexible denoising convolutional neu-
Different methods have been proposed to reduce the speckle noise ral network (FFDNet) [18]. Several state-of-the-art denoising neural
in fringe pattern images. Kemao [3] proposed a windowed Fourier fil- networks have an architecture based on the U-net model, such as the
ter algorithm (WFF) based on windowed Fourier transform (WFT). This densely connected hierarchical network (DHDN) [19], deep iterative
method takes a similar approach as the wavelet thresholding-based de- down-up convolutional neural network (DIDN) [20], and the residual
∗
Corresponding author.
E-mail addresses: francisco.gurrola@cimat.mx (J. Gurrola-Ramos), dalmau@cimat.mx (O. Dalmau), teresa.alarcon@academicos.udg.mx (T. Alarcón).
https://doi.org/10.1016/j.optlaseng.2021.106829
Received 23 May 2021; Received in revised form 14 September 2021; Accepted 28 September 2021
Available online 7 October 2021
0143-8166/© 2021 Elsevier Ltd. All rights reserved.
J. Gurrola-Ramos, O. Dalmau and T. Alarcón Optics and Lasers in Engineering 149 (2022) 106829
neural dense network (RDUNet) [21]. However, the high-quality results parametric mappings  (⋅; Θ), see Section 2.1, we need to estimate the pa-
of these models are achieved at a high computational cost in terms of rameters Θ. For this purpose, we can follow a standard machine learning
the number of parameters of the models and the computation required technique providing a dataset, also called training dataset, {(𝒙𝑖 , 𝒚 𝑖 )}𝑁 𝑖=1
to process the images. with 𝑁 pairs (𝒙𝑖 , 𝒚 𝑖 ) of noise image 𝒚 𝑖 and clean image 𝒙𝑖 , see details in
More recently, deep learning-based methods have been applied for Section 3.
speckle noise reduction in digital speckle pattern interferometry (FPD- Consequently, in order to estimate the parameters Θ, we can solve
CNN) [22] and fringe pattern filtering and normalization (V-net) [23]. the following optimization problem.:
In this paper, we present a residual dense convolutional neural net- 𝑁
work for fringe pattern image denoising. The proposed model adopts the 1 ∑ ( )
𝚯∗ = arg min   (𝒚𝒊 ; 𝚯), 𝒙𝑖 + 𝜆Ω(Θ), (5)
basic U-net auto-encoder and the use of specialized denoising blocks, but 𝚯 𝑁 𝑖=1
with lower computational cost and better performance. Our proposal is a where the first term of the previous equation corresponds to the fidelity
lightweight residual dense U-net (LRDUNet) that combines densely con- term, the second term is the regularization term and the hyperparameter
nected grouped convolutional blocks that allow the reuse of the feature 𝜆 > 0 controls the trade-off between these two terms. In the experiments
maps within convolutional blocks, local residual learning to address the we use 𝐿1 -norm and 𝐿2 -norm for the fidelity and regularization terms
vanishing gradient problem, and global residual learning to estimate the respectively, i.e.,
noise of the image instead of the denoised image directly. An ablation
study shows that more simplified models based on our proposal also (𝒚 , 𝒙) = ||𝑣𝑒𝑐(𝒚 − 𝒙)||1 , (6)
obtain competitive results and show the impact of the building blocks
in the performance of the model. The main contribution of this work is 1
Ω(Θ) = ‖𝑣𝑒𝑐(𝚯)‖22 , (7)
summarized as follows: we propose a denoising block that uses densely 2
connected grouped convolutional blocks. Using the previous denoising where 𝑣𝑒𝑐(⋅) is the vectorization operator that transforms the input ar-
block, we propose a U-net based model in which we change the standard gument in its corresponding vectorized version. In practice, both 𝐿1 and
convolutional blocks by our denoising blocks, and we also add a global 𝐿2 norms can be used as fidelity terms in Eq. (5); however, the 𝐿1 -norm
residual learning. Our modification reduces the computational complex- is more robust to outliers and generate fewer artifacts than the 𝐿2 -norm
ity of the original U-net model while improving its performance. [25]. In Eq. (7), the regularization term is based on Ridge regression,
Additionally, we carry out an intensive comparison of our proposal where the 𝐿2 -norm is used to stabilize the model and promote parame-
with other methods using a synthetic dataset. For this purpose, we con- ters whose absolute value is small [26].
sider classical methods and we trained neural networks designed for
natural image denoising and those based on neural networks for fringe 2.1. Model architecture
pattern denoising. In this comparison, we also include the original U-net
which was also trained for this task. During the denoising or evaluation For defining the family of parametric mappings  (⋅; Θ) in Eq. (4) we
step we apply a self-ensemble strategy to all the compared methods, but use a convolutional neural network based on the U-net model. First, we
the classical methods due to their invariance to the applied transforma- take into account that the observed corrupted image 𝒚 contains most
tions, in order to obtain a better estimation. of the main structure of the clean image. In the case of fringe pattern
The rest of this paper is organized as follows. In Section 2, we present images, it typically corresponds to low and mid frequencies. Based on
the fringe pattern modeling used in this work, introduce the proposed the additive observation model
model and provide details of its main components. Section 3 describes
the simulation process of fringe pattern datasets for training and eval- 𝒚 = 𝒙 − 𝜼, (8)
uating the models. In this Section, we also validate the performance of where 𝒙 is the clean image, 𝒚 is the observed image and 𝜼 is the noise,
the compared models; and Section 4 provides the conclusions of this we can write
paper.
𝒙 = 𝜼 + 𝒚, (9)
2. Materials and methods that suggests to model the clean image 𝒙 by modeling the noise, i.e.
𝜼 = (𝒚 ; Θ) (10)
From the model in Eq. (1), a speckle correlation fringe pattern is
modeled as follows, see details in [2,3,24]: Therefore, the denoising parametric model  (⋅; Θ) can be written as
follows:
𝐼𝑐 (𝒑; 𝑡) = 4𝑎𝑜 (𝒑; 𝑡)2 𝑎𝑟 (𝒑)2 + 4𝑎𝑜 (𝒑; 𝑡)2 𝑎𝑟 (𝒑)2 cos(Δ𝜙𝑜 (𝒑; 𝑡) + 𝜋) + 𝜂(𝒑; 𝑡0 , 𝑡)
 (𝒚 ; Θ) = (𝒚 ; Θ) + 𝒚 , (11)
(2)
which is called residual modeling. The architecture of the proposed
where 𝑎𝑜 (𝒑; 𝑡) and 𝑎𝑟 (𝒑) represent the amplitudes of the object and the model is shown in Fig. 1 and adopts the global residual modeling given
reference beams respectively, Δ𝜙𝑜 (𝒑; 𝑡) = 𝜙𝑜 (𝒑; 𝑡) − 𝜙𝑜 (𝒑; 𝑡0 ) is the differ- in Eq. (11).
ence of the phase of the object beam, 𝜙𝑜 (𝒑; 𝑡), between two speckle fields The main structure of the mapping  (𝒚 ; 𝚯) is the function (𝒚 ; Θ)
at time instances 𝑡 and 𝑡0 , and the noise term 𝜂(𝒑; 𝑡0 , 𝑡) is defined by which is based on a U-net model. The mapping (𝒚 ; Θ) consists of three
encoding/decoding levels, a bottleneck level, and a shortcut between all
𝜂(𝒑; 𝑡0 , 𝑡) = −4𝑎𝑜 (𝒑; 𝑡)2 𝑎𝑟 (𝒑)2 (1 − cos(Δ𝜙𝑜 (𝒑; 𝑡0 , 𝑡))) the encoding and decoding layers at the same level. Unlike the standard
× cos(𝜙𝑜 (𝒑; 𝑡0 ) + 𝜙𝑜 (𝒑; 𝑡) − 2𝜙𝑟 (𝒑; 𝑡0 )) (3) U-net model, we use specialized denoising blocks based on the RDUNet,
see Fig. 4 and Section 2.2, instead of the convolution and max-pooling
where 𝜙𝑟 (𝒑; 𝑡0 ) is the phase of the reference beam. Eqs. (2)-(3) are used blocks which are used in the original U-net.
in this work to generate the training and testing datasets, see Section 3. In Fig. 2, we detail the notation of the basic building blocks of the
The problem of recovering an image 𝒙 from a noise image 𝒚 can be proposed model. Each convolution in the proposed model has the ap-
formulated as finding the parametric function  (⋅; Θ) such that propriate padding for keeping the spatial dimension of the images, and
a parametric ReLU (PReLU) activation function with as many trainable
𝒙̂ =  (𝒚 ; Θ) (4)
parameters as the number of feature maps generated by the convolu-
where 𝒙̂ represent an estimation of the clean image 𝒙, and Θ are the pa- tion operation, see Fig. 2a. This activation function increases the model
rameters of the function. In order to completely define a given family of flexibility without introducing a large number of extra parameters [27].
2
Fig. 1. Architecture of the proposed neural network.
Fig. 2. Basic building building blocks of the LRDUNet.
Additionally, for the denoising block, we use grouped convolutions [11]. 2.2. Denoising block
The original motivation of the grouped convolutions was to distribute
the model over multiple GPUs as an engineering compromise, but later it The denoising block is based on the reuse of feature maps from the
was shown this module could be used to improve classification accuracy DenseNet model [29] and the bottleneck block of ResNet50 model [30],
and reduce the number of trainable parameters [28]. see Figs. 3–4. For the dense convolutions, we use 3 × 3 grouped convo-
For an input image, the LRDUNet first performs a 3 × 3 convolution lutions and 1 × 1 convolutions. At the beginning of the denoising block,
for feature extraction of the noisy image 𝒚 . This first convolution gener- we perform a dense 1 × 1 convolution to reduce the number of feature
ates 64 feature maps, which are the input to the first denoising block. At maps (𝑓 ) in half, and the generated feature maps are concatenated to
each level of the model, there are two denoising blocks in both encoding the input image. Afterward, two grouped dense convolutions with ker-
and decoding sections. The outputs of all the encoding levels are down- nel size 3 × 3 are performed. These convolutions use all their preceding
sampled by a factor of 2 using convolutions with kernel size 2 × 2 and feature maps in the denoising block as input information. Then, a final
stride size of 2 and generate twice as many feature maps as the input 1 × 1 convolution is applied to merge the input image and all the feature
image, i.e., these downsampling 2 × 2 convolutions halve the spatial di- maps generated within the denoising block. This last convolution gen-
mension of the image and double the number of feature maps to reduce erates an image with same number of feature maps as the input of the
the loss of information from one level to another. The number of fea- denoising block. Finally we carry out a local residual learning to address
ture maps generated at each level is 64, 128, 256, and 512, denoted as the vanishing gradient problem and speedup the training process.
𝑓 in Fig. 1, for levels 1, 2, 3, and 4, respectively. The upsampling in the The purpose of dense grouped convolutions is to reduce the number
decoding section is performed by transposed convolutions with kernel of trainable parameters of the model and its complexity. Consider 𝑓
size 2 × 2 and stride size of 2. In contrast with the downsampling 2 × 2 and ℎ the number of input and output feature maps of a convolution,
convolutions, transposed 2 × 2 doubles the spatial dimension of the im- respectively, and 𝑔 the number of groups. The number of parameters of
age, but they generate as many feature maps as their input images have. the normal convolution for a 𝑘 × 𝑘 kernel size is 𝑘 × 𝑘 × 𝑓 × ℎ + ℎ where
Additionally, there are shortcuts between encoding layer and decoding the additive term is the bias of the convolutional kernels. On the other
layer at the same level, which are handled through concatenations. After hand, the number of parameters for a grouped convolution is 𝑘 × 𝑘 × 𝑓𝑔 ×
each upsampling and concatenation, a 1 × 1 convolution is performed to ℎ + ℎ. Therefore, grouped convolution reduces the number of parameter
reduce the number of feature maps and keep the most important infor- 𝑘 ×𝑓 +𝑔
2
to times with respect to the normal convolution (see Fig. 2b).
mation from upsampling and shortcut sources. Finally, the 3 × 3 output (𝑘2 ×𝑓 +1)×𝑔
Additionally the downsampling 2 × 2 convolutions and the upsampling
convolution reduces the number of feature maps to 1 and generates an
2 × 2 transposed convolutions are also grouped, with the same number
estimation of the residual noise.
of groups that grouped convolution in the denoising block.
3
Fig. 3. Dense convolutions of the denoising

block.
Fig. 4. Denoising block of the LRDUNet:

The first 1 × 1 dense convolution reduces
the number of feature maps. Then, two
3 × 3 grouped dense convolutions are ap-
plied, each one processing all the informa-
tion before them. Finally, a 1 × 1 convolu-
tion merges all the information within the
denoising block and generates the same
number of feature maps as the block input
to apply local residual learning.
Despite the similarities of the proposed model with the DHDN and 2.3. Ablation study
RDUnet models for natural image denoising, there are several significant
differences: these models were trained to remove additive white Gaus- In this section, we measure the impact of the main features of the pro-
sian noise for several noise levels; since the high variability in natural posed model on its performance. The features considered are: the num-
images, and the capabilities of the DHDN and RDUnet models to han- ber of groups in grouped convolutions, the number of dense grouped
dle multiple noise levels, these models require a considerable amount of convolutions in the denoising blocks, the subsampling method, and the
trainable parameters; the LRDUNet requires only one 3 × 3 convolution use of global residual learning, Table 1. The variants of the proposed
to encode the input image before the first denoising block and one 3 × 3 model are denoted in the form LRDUNet𝑣𝑗 where 𝑗 = 0, … , 14.
convolution to decode the residual noise; the first and last convolutions We take into account groups of size 8 and 16 for both in-
in the denoising block have a kernel size of 1 × 1 instead of 3 × 3, re- ner 3 × 3 dense convolutions in the denoising block and the down-
ducing the number of parameters and the overfitting of the model; the samplig/upsampling 2 × 2 convolutions. The rest of the convolutions in
LRDUNet uses grouped convolution to reduce the number of parameters the model are not grouped. Concerning to inner dense convolutions, de-
in the doinsing block, while the other models use normal convolutions noted as DC3×3 , in the denoising block we study the effects of 1 or 2
only. It should be noted that both DHDN and RDUnet models could be dense convolutions in our proposal.
trained for the fringe pattern image denoising task. However, given the Two subsampling methods are evaluated: strided 2 × 2 convolution
very high number of parameters, it could be complicated to address the (see Section 2) denoted as Conv2×2 , and the combination of 1 × 1 convo-
overfitting problem. Therefore, their generalization capability and per- lution and max-pooling denoted as Conv1×1 +MP. The 1 × 1 convolution
formance on the simulated test dataset or experimental fringe patterns doubles the number of input feature maps, whereas the max-pooling
could be much lower than their results in the training dataset. halves the spatial dimension of the image. Additionally, we study the
Table 1
Ablation study of the main components of the proposed model: number of groups in dense convolutions, number of 3 × 3 dense convolutions (DC3×3 ) per de-
noising block, 1 × 1 convolution and max-pooling subsampling (Conv1×1 ), 2 × 2 convolution with stride = 2 subsampling (Conv2×2 ), and global residual learn-
ing (GRL). We also include the number of trainable parameters, multiply-accumulate (MAC) operations on 64 × 64 images, and the PSNR value on the testing
dataset.
Denoising block Subsampling Complexity and performance

Models
Groups DC3×3 Conv1×1 Conv2×2 GRL #Param. MACs PSNR (dB)
√
LRDUNet𝑣0 16 1 ✗ ✗ 3.07 M 1.01 G 38.70
√ √
LRDUNet𝑣1 16 1 ✗ 3.07 M 1.01 G 39.14
√
LRDUNet𝑣2 16 1 ✗ ✗ 2.94 M 0.91 G 39.33
√ √
LRDUNet𝑣3 16 1 ✗ 2.94 M 0.91 G 39.20
√
LRDUNet𝑣4 16 2 ✗ ✗ 4.0 M 1.26 G 38.34
√ √
LRDUNet𝑣5 16 2 ✗ 4.0 M 1.26 G 38.62
√
LRDUNet𝑣6 16 2 ✗ ✗ 3.87 M 1.16 G 39.67
√ √
LRDUNet𝑣7 16 2 ✗ 3.87 M 1.16 G 39.83
√
LRDUNet𝑣8 8 1 ✗ ✗ 3.25 M 1.13 G 38.45
√ √
LRDUNet𝑣9 8 1 ✗ 3.25 M 1.13 G 38.77
√
LRDUNet𝑣10 8 1 ✗ ✗ 3.16 M 1.04 G 39.35
√ √
LRDUNet𝑣11 8 1 ✗ 3.16 M 1.04 G 39.35
√
LRDUNet𝑣12 8 2 ✗ ✗ 4.29 M 1.47 G 39.29
√ √
LRDUNet𝑣13 8 2 ✗ 4.29 M 1.47 G 39.15
√
LRDUNet𝑣14 8 2 ✗ ✗ 4.21 M 1.38 G 39.78
√ √
LRDUNet 8 2 ✗ 4.21 M 1.38 G 40.14
4
use of global residual learning, denoted as GRL. If the model does not 3. Experiments
use GRL, the denoised image is estimated directly.
As shown in Table 1, depending on the model configuration, 3.1. Data simulation
the number of parameters and multiply-accumulate (MAC) operations
varies from 2.94 M to 4.29 M and from 0.91 G to 1.47 G respectively. In order to train and evaluate the proposed model, we generate train-
When the number of groups is set to 16, the number of parameters and ing, validation and testing datasets. These datasets are created based on
MAC is less than when the number of groups is 8. On the other hand, the ESPI model in Eqs. (2)-(3). For creating the datasets, we first sim-
performance with 16 groups is lower than with 8 groups in most cases. ulate the phase and then we obtain the clean fringe pattern image and
When DC3×3 = 2, the number of parameters is increased by 1 M com- the corresponding noisy image. Below we give details about the phase,
pared to the DC3×3 = 1. However, the results in Table 1 demonstrate that corresponding noisy image and datasets generation.
the best performance of the model is achieved when the DC3×3 = 2 and Phase generation: For the phase function Δ𝜙𝑜 (𝒑), we use radial ba-
the number of groups is 8. sis function with Gaussian kernel
With regard to the subsampling methods, we notice that ∑ 𝑛
Conv1×1 +MP increases the number of parameters in comparison to Δ𝜙0 (𝒑) = 𝜅 𝑤𝑖 𝐺(𝒑; 𝜇𝑖 , Σ𝑖 ) (12)
𝑖=1
Conv2×2 , in which the best performance is achieved. The learned
Conv2×2 parameters can combine positive and negative values and there- where 𝑛 is the number of basis functions, 𝜅 is a scale factor to control
fore it is possible to measure contrast and homogeneity, unlike max- fringe density, 𝑤𝑖 , 𝜇𝑖 and Σ𝑖 are the weight, mean vector and covariance
pooling, which measures uniformity and only considers information matrix of the 𝑖-th basis function respectively. The Gaussian kernel is
within each feature map independently. defined as
The use of GRL increases the performance of the model in most cases [0𝑝𝑡]def ( )
1
𝐺(𝒑; 𝜇, Σ) = exp − (𝒑 − 𝜇)Σ−1 (𝒑 − 𝜇)𝑇 (13)
without using additional parameters. This fact supports the model de- 2
scribed in Eq. (11). .
Clean images: A clean image 𝒙 is obtained from the model of
Eq. (2) without the noise term:
𝒙(𝒑) = 4𝑎𝑜 (𝒑)2 𝑎𝑟 (𝒑)2 (1 + cos(Δ𝜙𝑜 (𝒑) + 𝜋)), (14)
where the phase Δ𝜙𝑜 (𝒑) is computed according to Eq. (12). Moreover,
we use 4𝑎𝑜 (𝒑)2 𝑎𝑟 (𝒑)2 = 1 and the intensity of 𝒙 is finally normalized in
the range [0, 1]. Since constant amplitude and phase are being used for
all 𝑡, we omit this term from the equations for the simulation of the
images.
Noise images: We simulate the speckle noise according to Eq. (3),
we use the following settings: 𝜙𝑜 (𝒑) ∼ 𝑈 (−𝜋, 𝜋) is a random variable with
Table 2
Performance of different denoising methods on test dataset. The first block of
methods corresponds to classical methods, the second block corresponds to neu-
ral network methods, and the third block corresponds to neural network meth-
ods with self-ensemble strategy.
Methods PSNR (dB) MAE SSIM 𝑄
WFF 15.6245 0.1239 0.7798 0.9092

CED 15.4512 0.1338 0.8052 0.9145
FDP-CNN 33.7919 0.0135 0.9903 0.9935
DnCNN 34.3749 0.0127 0.9895 0.9925
FFDNet 35.2269 0.0117 0.9932 0.9921
V-net 36.8157 0.0100 0.9958 0.9966
U-net 38.4253 0.0084 0.9973 0.9949
LRDUNet 40.1427 0.0067 0.9980 0.9974
FDP-CNN+ 34.6145 0.0125 0.9922 0.9937
DnCNN+ 35.0073 0.0118 0.9918 0.9929
FFDNet+ 36.1969 0.0104 0.9949 0.9932
V-net+ 37.5800 0.0092 0.9964 0.9968
U-net+ 39.6721 0.0073 0.9980 0.9953
LRDUNet+ 41.0231 0.0060 0.9984 0.9983
Table 3
Number of trainable parameters, multiply-accumulate (MAC) operations, and
running time of the deep learning-based models on 64 × 64 image patches.
# Run time (ms)

Model Parameters MAC
GPU CPU
DnCNN 688.23 k 2.74 G 1.52 12.35

FFDNet 486.98 k 0.50 G 1.30 3.66
FDP-CNN 1.86 M 7.61 G 4.85 29.58
V-net 3.72 M 5.21 G 1.87 28.60
Fig. 5. Example of simulated data: a) Noisy image according to Eq. (2), b) noisy U-net 31.00 M 3.41 G 3.10 27.82
LRDUNet 4.21 M 1.38 G 6.65 51.39
image after applying brightness shift, Gaussian blur, and additive white Gaus-
LRDUNet (no grouped) 8.49 M 4.49 G 5.16 52.87
sian noise as data augmentation, c) ground-truth image.
5
uniform distribution, 𝜙𝑟 (𝒑) = 0, 𝑎𝑟 (𝒑)2 = 1 and 𝑎𝑜 (𝒑)2 ∼ 𝐸𝑥𝑝(𝛽) is a ran- Datasets: First, we simulate 1700 pairs (𝒚 , 𝒙) of noisy-clean images
dom variable with exponential distribution and scale parameter 𝛽 = 0.5. of size 256 × 256, Eqs. (14)-(15). From the previous set of image pairs,
The sampled values of 𝑎𝑜 (𝒑)2 are clipped in the range [0, 1]. With the we use 1500 image pairs as training set, and from the rest we take 100
purpose of obtaining the noisy image, we compute image pairs as validation set and 100 image pairs as test set. For every
simulated image pairs, the parameters 𝑛, 𝜅 and 𝑤𝑖 are chosen randomly
𝒚 (𝒑) = 𝒙(𝒑) + 𝜂(𝒑) (15) in the ranges 𝑛 ∈ [1, 8], 𝜅 ∈ [1, 5], 𝑤𝑖 ∈ [−15, −1] ∪ [1, 15]. The basis func-
tions 𝐺(𝒑; 𝜇, Σ) are sampled in the plane  ×  = {(𝑥, 𝑦)|𝑥 ∈ [−3, 3], 𝑦 ∈
where 𝒙(𝒑) is the clean image, Eq. (14), and 𝜂(𝒑) is the added noise, [−3, 3]}. The components of the mean vector, 𝜇𝑥 and 𝜇𝑦 , are generated
Eq. (3). In this particular case, for computing the noise image 𝒚 (𝒑), the from a normal distribution  (0, 1). The components of the covariance
factors 𝑎𝑜 (𝒑)2 and 𝑎𝑟 (𝒑)2 in Eq. (14) are the same as the one used to matrix 𝜎𝑥 and 𝜎𝑦 are randomly drawn from a uniform distribution in the
generate the noise 𝜂(𝒑). range [0.5, 3], whereas the other components 𝜎𝑥𝑦 = 𝜎𝑦𝑥 are set to 0. In
Fig. 6. Comparison of denoising methods using a simulated fringe pattern. On the left side of each image appears a zoomed region and its corresponding PSNR value
obtained for each method. The PSNR(dB)/MAE/𝑄 values of the denoised image are in the subtitle.
6
Fig. 5 we show some examples of simulated fringe pattern images used tionally, we apply data augmentation during the training. In particular,
during the training process. we consider the following sequence of transformations: brightness shift,
Gaussian blur, and additive white Gaussian noise. Each transformation
is applied with a probability of 0.5. The latter data augmentation was
3.2. Model training applied to avoid model overfitting.
In order to optimize Eq. (5) we use the AdamW algorithm [31].
For training the model, the training images are cropped in patches This optimization algorithm allows us to decouple the first and sec-
of size 64 × 64. For data augmentation, we apply vertical and horizontal ond terms of the loss function Eq. (5) during the training. The reg-
flips, and 90◦ rotation. The cropped images and the data augmentation ularization parameter used in Eq. (6) is 𝜆 = 10−2 . The parameters of
produce a training dataset of 192,000 noisy-clean patch pairs. Addi- the AdamW algorithm are 𝛽1 = 0.9, 𝛽2 = 0.999, and 𝜖 = 10−8 . The ini-
7
tial learning rate is 𝛼0 = 10−3 , which is halved every 5 epochs. The [8], and the fringe pattern denoising convolutional neural network
LRDUNet model was trained with a batch size of 16 for 20 epochs. model (FPD-CNN) [22]. Moreover, we compare the proposed LRDUNet
Our model was implemented in Python 3.6 using PyTorch framework. model with the neural models for natural image denoising, named de-
The training time was 8 hours in a Nvidia RTX Titan GPU. The source noising convolutional neural network (DnCNN) [17] and fast and flex-
code, pretrained model and dataset are available at github (coming ible denoising convolutional neural network (FFDNet) [18], and the V-
soon). net neural network model for fringe pattern filtering and normalization
[23]. We adjust the parameters of WFF and CED to maximize their per-
3.3. Model comparison formance and we trained every neural network model from scratch ac-
cording to their reported setup using the same training dataset as the
We compare the proposed model with the windowed Fourier filtering LRDUNet. We also trained from scratch and compare results with the
(WFF) algorithm, [3], the coherence-enhancing diffusion model (CED) original U-net model [12]. Additionally, we apply self-ensemble predic-
8
tion method to boost the performance of the deep learning models [32]. 3.4. Results on simulated fringe patters
In the self-ensemble method, the results from eight flipped/rotated ver-
sions of the estimated image are averaged. The self-ensemble results of This section presents the results on simulated fringe pattern images
the deep learning models are denoted as FPD-CNN+, DnCNN+, FFD- according to Section 3.1. The results of the comparison on the test
Net+, V-net+, U-net+, LRDUNet+. Although the proposed model was dataset are reported in Table 2 with the best results highlighted in bold.
trained with patches of size 64 × 64, during the test the denoised images Additionally, Figs. 6–10 depict some results of the compared denois-
were estimated using the whole noisy image without dividing them into ing methods and their corresponding skeletons on five simulated fringe
patches for all the models. For assessing the compared models, we use patterns from the test dataset.
the peak signal-to-noise ratio (PSNR) average [33], structural similar- As shown in Table 2, there is a considerable performance improve-
ity index (SSIM) average [34], mean absolute error (MAE), and quality ment of the neural network models compared with the classical methods
index 𝑄 [35]. WFF and CED, with a difference at least of 18 dB PSNR value and MAE
9
Fig. 10. Comparison of denoising methods using a simulated fringe pattern. On the left side of each image appears a zoomed region and its corresponding PSNR
value obtained for each method. The PSNR(dB)/MAE/𝑄 values of the denoised image are in the subtitle.
value ten times lower. On the other hand, the neural networks with fringes. The results obtained with the CNN-based models are visually
self-ensemble estimation improve in about 1 dB PSNR respect to the very similar, but their PSNR values vary from 19.67 dB to 42.44 dB,
corresponding model without self-ensemble, and also slightly improve where the proposed model achieves the best results in most cases. Note
the other metrics. The LRDUNet outperforms the other neural network that regions with high frequencies are very difficult for all the methods.
models in both straightforward and self-ensemble estimations, being the On the other hand, in these regions, WFF and CED methods obtain blurry
U-net+ the closest model with about 0.47 dB lower than the simple LR- results, while they partially remove the noise in the low-frequency re-
DUNet and about 1.35 dB lower than the LRDUNet+. Based on Figs. 6– gion of the image. In most cases, the skeletons of the images generated
10, the results obtained by WFF and CED methods are not satisfactory. by WFF and CED methods are broken and do not resemble the clean im-
The WFF produces blurring in regions with high frequency, note also age skeleton. In general, the CNN-based models achieve better results.
that their corresponding skeleton is broken in those areas. The CED re- The U-net+ and LRDUNet+ models obtain the best results, both visual
duces some noise, but it generates artifacts mostly aligned along the and numerical, in comparison with the other methods.
10
Fig. 11. Comparison of denoising methods us-

ing an experimentally obtained optical fringe
pattern.

pattern.
3.5. Results on experimentally obtained fringe patters Section 3.4, CNN-based methods achieve better visual results than clas-
sic WFF and CED methods. However, in this case the results among CNN-
In this section we present a visual comparison of the models studied based methods are visually very different. FDP-CNN+ and V-net+ yield
in Section 3.4 using real experimental ESPI images. From Figs. 11–15 blurred regions and they are not able to retrieve the details of the fringe
we show the denoising results achieved by our proposal and the meth- patterns. DnCNN+ and FFDNet+ denoise the fringes in more detail and
ods considered in the comparison. Similar to the results discussed in obtain a better contrast than FDP-CNN+ and V-net+. Notwithstanding,
11

pattern.

pattern.
they still produce blurred results in some regions of the images. The U- In comparison with simulated fringe patterns, the experimentally
net+ produces images with less blurring than the previous models, but obtained fringe pattern images do not have homogeneous brightness.
the fringes are still not well defined in some regions. Finally, the LR- The center regions of experimental images are more consistent with
DUNet+ achieves more detailed images, with less blurring and more the simulated images, so the proposed model gets better performance.
contrast in the fringe patterns. On the other hand, the edges of the experimental image have less
12

pattern.
brightness, being these regions where the performance of the model is The last block in Table 3 considers the U-net model, the proposed
lower. LRDUNet and additionally the LRDUNet without using grouped convo-
The experimental images can be obtained under different contexts. lutions. The U-net is the model with the largest number of parameters
The underlying mathematical models of the experimental images can be among all the compared models. However, this model has a moderate
more complex from the model used to generate the simulated images, number of MAC operations in contrast with its number of parameters.
mainly in generating the phase and the uniform brightness of the train- In the case of the LRDUNet, the number of parameters is much lower
ing dataset. Training the model with experimental images in addition than the U-net, mainly because of the use of grouped convolutions, the
to simulated images could improve its performance. However, to the reuse of feature maps in dense convolutional blocks, and the use of four
best of our knowledge, we are not aware of some techniques that will encoding and decoding levels. Notice that the bottleneck is the section
provide clean images to train the model properly. with more parameters and requires more computations. In this part, the
LRDUNet model produces 512 feature maps, whereas the U-net model
produces 1024 feature maps. Nevertheless, despite the low computa-
3.6. Model complexity tional cost in terms of the number of operations and parameters, its run
time is longer than the other models. Note in the last row of Table 3 that
In Table 3 we analyze the model complexity considering the number the number of parameters and MAC operations grows noticeably when
of multiply-accumulate operations (MACs), the number of parameters, the LRDUNet does not use grouped convolutions and the computational
and the running time of CNN-based models. time in GPU is lower. Thus, part of the increment in the run time may
The first block corresponds to models designed for natural image be due to the overhead generated by the used framework when it per-
denoising. The DnCNN performs a large number of MAC operations with forms the grouped convolution. Additionally, another factor that may
respect to the number of parameters. This is mainly because this model influence the running time is the number of convolutions carried out
does not use subsampling and upsampling operations, keeping the image by the proposed model and the number of image concatenations in the
with the same spatial dimension throughout the denoising process. On denoising blocks.
the other hand, the FFDNet performs a subsampling and an upsampling
operation, which reduce the required computation. The execution times 4. Conclusion
of these models are very similar in GPU, not so the case in CPU where
the FFDNet is more than 3 times faster than the DnCNN. This research addressed the fringe pattern image denoising problem.
The second block in Table 3 shows the model complexity of models For solving this problem, we proposed a model based on deep convolu-
designed for fringe pattern denoising. Notice that the number of param- tional neural networks. In particular, the proposed model was applied
eters and MAC operations increased in both FDP-CNN and V-net models to the electric speckle pattern interferometry denoising problem.
in comparison to models designed for natural image denoising. The FDP- Some drawbacks of deep neural networks are the vanishing and the
CNN by not using subsampling and upsampling operations, it performs exploding gradient problems. For these reasons, we incorporated local
a large number of MAC operations, whereas V-net model reflects a bal- and global residual learning. At the same time, the global residual learn-
ance between the number of parameters and MAC operations. The V-net ing allowed us to take advantage of the structure of the noisy image.
is more than twice as fast as the FDP-CNN in the case of running in GPU, Therefore, our proposal basically models the residual noise of the image.
but the execution time in CPU is very similar between both FDP-CNN Additionally, our model incorporates densely connected grouped convo-
and V-net models. lutions in order to reuse the feature maps in the denoising blocks. The
13
use of this strategy reduced the complexity of the model. Experiments [12] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomed-
on simulated fringe patterns demonstrated that our proposed network ical image segmentation. In: International conference on medical image
computing and computer-assisted intervention. Springer; 2015. p. 234–41.
outperformed the state of the art algorithms, being the U-net+ the clos- doi:10.1007/978-3-319-24574-4_28.
est model with about 0.47 dB lower than the simple LRDUNet and about [13] Liu L, Cheng J, Quan Q, Wu F-X, Wang Y-P, Wang J. A survey on u-shaped networks
1.35 dB lower than the LRDUNet+ in PSNR values. Furthermore, the in medical image segmentations. Neurocomputing 2020;409:244–58.
[14] Nishizaki Y, Valdivia M, Horisaki R, Kitaguchi K, Saito M, Tanida J,
LRDUNet+ achieved a better visual result when applied to real ESPI Vera E. Deep learning wavefront sensing. Opt Express 2019;27(1):240–51.
images, with less blurring and more contrast in the fringe patterns. doi:10.1364/OE.27.000240.
As future work, we consider to extend our proposal to other optical [15] Spoorthi GE, Gorthi S, Gorthi RKSS. PhaseNet: a deep convolutional neural network
for two-dimensional phase unwrapping. IEEE Signal Process Lett 2019;26(1):54–8.
interferometry tasks such as wrapped phase denoising, normalization,
doi:10.1109/LSP.2018.2879184.
and models that combine speckle and Gaussian noise. [16] Zuo W, Zhang K, Zhang L. Convolutional neural networks for image denoising
and restoration. Cham: Springer International Publishing; 2018. p. 93–123. ISBN
978-3-319-96029-6. chap. 4
Declaration of Competing Interest
[17] Zhang K, Zuo W, Chen Y, Meng D, Zhang L. Beyond a gaussian denoiser:
residual learning of deep CNN for image denoising. IEEE Trans Image Process
The authors declare that they have no known competing financial 2017;26(7):3142–55. doi:10.1109/TIP.2017.2662206.
interests or personal relationships that could have appeared to influence [18] Zhang K, Zuo W, Zhang L. FFDNet: toward a fast and flexible solution for
CNN-based image denoising. IEEE Trans Image Process 2018;27(9):4608–22.
the work reported in this paper. doi:10.1109/TIP.2018.2839891.
[19] Park B, Yu S, Jeong J. Densely connected hierarchical network for image denois-
CRediT authorship contribution statement ing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition workshops; 2019. 0–0
[20] Yu S, Park B, Jeong J. Deep iterative down-up CNN for image denoising. In: Pro-
Javier Gurrola-Ramos: Conceptualization, Software, Visualization, ceedings of the IEEE/CVF conference on computer vision and pattern recognition
Writing – review & editing. Oscar Dalmau: Conceptualization, Method- workshops; 2019. 0–0
[21] Gurrola-Ramos J, Dalmau O, Alarcón TE. A residual dense u-net neural net-
ology, Writing – review & editing, Visualization, Supervision. Teresa work for image denoising. IEEE Access 2021;9:31742–54. doi:10.1109/AC-
Alarcón: Methodology, Writing – review & editing, Visualization. CESS.2021.3061062.
[22] Lin B, Fu S, Zhang C, Wang F, Li Y. Optical fringe patterns filtering based on multi-
-stage convolution neural network. Opt Lasers Eng 2020;126:105853.
Acknowledgment
[23] Reyes-Figueroa A, Flores VH, Rivera M. Deep neural network for fringe pattern fil-
tering and normalization. Appl Opt 2021;60(7):2022–36.
This work was supported by CONACYT (Mexico) under Grant [24] Jones R, Wykes C, Wykes J, et al. Holographic and speckle interferometry Cam-
bridge studies in modern optics. 2nd ed. Cambridge University Press; 1989.
258033 and the project “Laboratorio de Supercómputo del Bajío (No.
doi:10.1017/CBO9780511622465.
300832)”. [25] Zhao H, Gallo O, Frosio I, Kautz J. Loss functions for image restoration with neural
networks. IEEE Trans Comput Imaging 2016;3(1):47–57.
References [26] Bishop CM. Training with noise is equivalent to Tikhonov regularization. Neural
Comput 1995;7(1):108–16. doi:10.1162/neco.1995.7.1.108.
[1] Wykes C. Use of electronic speckle pattern interferometry (ESPI) in the measurement [27] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level
of static and dynamic surface displacements. Opt Eng 1982;21(3):213400. performance on imagenet classification. In: Proceedings of the IEEE international
[2] Kemao Q. Windowed fringe pattern analysis. EBL-Schweitzer, SPIE Press; 2013. ISBN conference on computer vision (ICCV), Santiago; 2015. p. 1026–34.
9780819496430. https://books.google.com.mx/books?id=k_sNnQEACAAJ [28] Xie S, Girshick R, Dollar P, Tu Z, He K. Aggregated residual transformations for
[3] Kemao Q. Windowed fourier transform for fringe pattern analysis. Appl Opt deep neural networks. In: Proceedings of the IEEE conference on computer vision
2004;43(13):2695–702. doi:10.1364/AO.43.002695. and pattern recognition (CVPR); 2017.
[4] Kulkarni R, Rastogi P. Fringe denoising algorithms: a review. Opt Lasers Eng [29] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional
2020;135:106190. doi:10.1016/j.optlaseng.2020.106190. networks. In: Proceedings of the IEEE conference on computer vision and pattern
[5] Dalmau O, Rivera M, Legarda-Saenz R. Fast phase recovery from a sin- recognition (CVPR), Honolulu, HI, USA; 2017. p. 2261–9.
gle closed-fringe pattern. J Opt Soc Am A 2008;25(6):1361–70. doi:10.1364/ [30] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Pro-
JOSAA.25.001361. ceedings of the IEEE conference on computer vision and pattern recognition, Las
[6] Dalmau O, Rivera M, Gonzalez A. Phase shift estimation in interfero- Vegas, NV, USA; 2016. p. 770–8.
grams with unknown phase step. Opt Commun 2016;372:37–43. doi:10.1016/ [31] Loshchilov I, Hutter F. Decoupled weight decay regularization. In: 7th International
j.optcom.2016.03.063. conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9,
[7] Tang C, Han L, Ren H, Zhou D, Chang Y, Wang X, Cui X. Second-order oriented 2019. OpenReview.net; 2019. https://openreview.net/forum?id=Bkg6RiCqY7
partial-differential equations for denoising in electronic-speckle-pattern interferom- [32] Timofte R, Rothe R, Van Gool L. Seven ways to improve example-based single image
etry fringes. Opt Lett 2008;33(19):2179–81. doi:10.1364/OL.33.002179. super resolution. In: Proceedings of the IEEE conference on computer vision and
[8] Wang H, Kemao Q, Gao W, Lin F, Seah HS. Fringe pattern denoising using coherence- pattern recognition (CVPR), Las Vegas, NV, USA; 2016. p. 1865–73.
enhancing diffusion. Opt Lett 2009;34(8):1141–3. doi:10.1364/OL.34.001141. [33] Huynh-Thu Q, Ghanbari M. Scope of validity of PSNR in image/video quality assess-
[9] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. ment. Electron Lett 2008;44. 800–801(1)
[10] Zhao Z, Zheng P, Xu S, Wu X. Object detection with deep learning: a re- [34] Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from er-
view. IEEE Trans Neural Netw Learn Syst 2019;30(11):3212–32. doi:10.1109/ ror visibility to structural similarity. IEEE Trans Image Process 2004;13(4):600–12.
TNNLS.2018.2876865. doi:10.1109/TIP.2003.819861.
[11] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolu- [35] Zhou Wang, Bovik AC. A universal image quality index. IEEE Signal Process Lett
tional neural networks. Adv Neural Inf Process Syst 2012;25:1097–105. 2002;9(3):81–4. doi:10.1109/97.995823.
14

2022 Gurrola Dalmau Alarcon Unet Fringe Denoising

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 Gurrola Dalmau Alarcon Unet Fringe Denoising

Uploaded by

Copyright:

Available Formats

Optics and Lasers in Engineering 149 (2022) 106829

Contents lists available at ScienceDirect

Optics and Lasers in Engineering

U-Net based neural network for fringe pattern denoising

1. Introduction noising method by thresholding the WFT coeﬃcients to remove spectral

Fig. 1. Architecture of the proposed neural network.

Fig. 2. Basic building building blocks of the LRDUNet.

Fig. 3. Dense convolutions of the denoising

Fig. 4. Denoising block of the LRDUNet:

Denoising block Subsampling Complexity and performance

Methods PSNR (dB) MAE SSIM 𝑄

WFF 15.6245 0.1239 0.7798 0.9092

# Run time (ms)

DnCNN 688.23 k 2.74 G 1.52 12.35

Fig. 11. Comparison of denoising methods us-

Fig. 12. Comparison of denoising methods us-

Fig. 13. Comparison of denoising methods us-

Fig. 14. Comparison of denoising methods us-

Fig. 15. Comparison of denoising methods us-

You might also like