Professional Documents
Culture Documents
Abstract
Since noise degrades the accuracy and precision of DNA capillary electrophoresis (CE)
analysis, signal denoising is thus important to facilitate the postprocessing of CE data. In this
paper, a new denoising algorithm based on dyadic wavelet transform using multiscale products
is applied for the removal of the noise in the DNA CE signal. The adjacent scale wavelet
coefficients are first multiplied to amplify the significant features of the CE signal while
diluting noise. Then, noise is suppressed by applying a multiscale threshold to the multiscale
products instead of directly to the wavelet coefficients. Finally, the noise-free CE signal is
recovered from the thresholded coefficients by using inverse dyadic wavelet transform. We
compare the performance of the proposed algorithm with other denoising methods applied to
the synthetic CE and real CE signals. Experimental results show that the new scheme achieves
better removal of noise while preserving the shape of peaks corresponding to the analytes in
the sample.
Keywords: capillary electrophoresis, signal denoising, multiscale products, dyadic wavelet
transform
0957-0233/13/065004+09$33.00 1 © 2013 IOP Publishing Ltd Printed in the UK & the USA
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
soft or hard threshold to remove noise [9, 10]. It has been shown it less attractive for the analysis of non-stationary signals. In
in [7] that the wavelet thresholding algorithms can provide a order to analyze the CE signal, we employ the dyadic WT
better reduction of noise as compared with traditional filters. proposed by Mallat and Zhong [13] to decompose the signal.
However, because the peaks of CE signals are very sharp and By sampling the translation parameter with the same
their frequencies are very high, the peaks and noise cannot be sampling period as the input function to the discrete WT, the
distinguished efficiently through the application of a general dyadic WT of the function f (x) can be expressed as a sequence
threshold. Moreover, these wavelet-based denoising methods of functions:
fail to take into account the spatial dependence of wavelet
{W2 j f (x)} j∈Z , (4)
coefficients, and thus the obtained results usually exhibit visual
−j −j
artifacts and pseudo-Gibbs phenomena in the neighborhood of where W2 j f (x) = f (x) ∗ ψ2 j (x), and ψ2 j (x) = 2 ψ (2 x)
the discontinuities. is the dilation of the wavelet ψ (x) by a scaling factor 2 j . The
In this paper, we propose a CE signal denoising algorithm function f (x) can also be reconstructed from its dyadic WT
based on the combination of dyadic WT [11] and multiscale with the summation
products [12], which can achieve both noise reduction +∞
and peak preservation. The products of the adjacent scale f (x) = W2 j f (x) ∗ χ2 j (x), (5)
subbands, which represent the interscale dependences, amplify j=−∞
the significant features of underlying signal while dilute where χ (x) is any reconstructing wavelet whose FT satisfies
noise. Consequently, we apply an appropriate multiscale [13]
product threshold to distinguish the signal from noise directly. +∞
Moreover, a new threshold incorporating the signal-to-noise ψ̂ (2 j ω)χ̂ (2 j ω) = 1, (6)
ratio (SNR) is also designed in this work. j=−∞
2
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
Figure 1. The discrete decomposition and reconstruction algorithms of 1D dyadic WT (two levels shown). Where ∗ is a conjugation operator.
have opposite Lipschitz regularity. Thus, the multiplication g, it can be derived that σ f2 = σg2 + σ 2 . We divide the finest
of the DWT coefficients at adjacent scales can lead to the scale coefficients W of the dyadic WT into two parts: the first
enhancement of edge structures while weakening noise. In part Wa consists of points |W (·)| > σ f , and thesecond part
of points |W (·)| σ . Let σW = E[Wa ] and
this paper, the multiscale products are defined as Wb consists a 2
k2
σWb = E[Wb2 ]. Generally, the noise energy is concentrated
Pj f (x) = W j+i f (x), (8)
on Wb and σWb can be considered as an approximation of the
i=k1
noise standard deviation σ ; then the standard deviation of the
where k1 and k2 are non-negative integers.
noise-free CE signal is estimated as [17]
It is sufficient to implement the multiplication at two
adjacent scales in practice [17], so if we let k1 = 0 and k1 = 1, σ̂g =
2
σWa − 2.9σWb .
2
(11)
then the multiscale products are
Obviously, σ̂g will be equal to zero if W is generated totally by
Pj f (x) = W j f (x) · W j+1 f (x), (9)
noise. Contrarily, σ̂g will be greater if the signal contains more
where j is the scale. details in Wa . In this case, σ̂g can be seen as an approximated
estimation of σg. Finally, the noise standard deviation can be
3. Threshold estimation and denoising algorithm estimated as
2
3.1. Threshold estimation σ̂ j = ψ j σ f / 1 + σ̂g/σWb , (12)
In the wavelet-based threshold denoising scheme, an where ψ j = ψ 2j (x)dx.
appropriate threshold should be estimated to distinguish the Since the dyadic WT is a linear transform, after its
signal from noise. However, finding a good threshold is operation, formula (10) becomes W j f = W j g + W j ε. Let
not an easy task. The commonly used thresholds such as Pj g = W j g·W j+1 g and Pj ε = W j ε·W j+1 ε denote the multiscale
universal threshold, Sure threshold and SureShrink threshold products of the noise-free signal and the noise at j scale,
are obviously not suitable for our denoising scheme. On respectively; then we have by means of Pj ε and Pj g [17]
one hand, the signal of DNA CE is very complicated due
to sophisticated DNA sequencing and the DNA separation μ j ε = ρ j+1, j σ j σ j+1 (13)
process. Consequently, peaks in the CE signal are very dense,
and there are many discontinuities. These threshold schemes μ j g = E[Pj f ] − E[Pj ε] = μ j f − μ j ε, (14)
fail to accurately estimate the noise level and establish a where
threshold. On the other hand, these thresholds are originally
designed for the single-scale wavelet coefficients and not for
ρ j+1, j = ψ j (x) · ψ j+1 (x) dx/ ψ 2j (x) dx · ψ 2j+1 (x)) dx
the products of the adjacent scales. To achieve the purpose of
noise removal, we propose a new threshold to suppress the
is the correlation coefficient of W j ε and W j+1 ε.
noise of the CE signal on the basis of the studies of Bao and
We set the multiscale product threshold as
Zhang [12, 17] in this paper.
To establish the multiscale products threshold, a noise t j p = c · σ j σ j+1 (1 + μ j ε/μ j g), (15)
level denoted by σ j for each subband is first estimated.
According to the preceding discussion, we know that σ = where c is any positive constant obtained after some trial run.
Median(|d(x)|)/0.6745, d(x) ∈ H1 involved in the universal In our experiments, c ≈ 15 yields the best results. The ratio
threshold is inaccurate for the CE signal. To compute σ j , the μ j ε/μ j g in (15) can be used to adjust the threshold t j p applied
model of the signal is assumed as to the multiscale products Pj f . For example, when the noise
is much stronger compared with the signal at fine scales, the
f = g + ε, (10) value of the ratio of μ j ε/μ j g is higher too. Thus, the threshold
where f , g and ε are the observed CE signal, the noise-free CE t j p will be large enough to suppress the noise. Contrarily, the
signal and the noise with Gaussian distribution, respectively. threshold t j p will be at an appropriate level to remove the
Since the noise ε is independent of the noise-free CE signal corresponding noise while preserving signal structures.
3
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
4
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
25
2
(a)
1
20
15
Amplitude
10
0
0 500 1000 1500 2000
Time
25 25
(b) (c)
20 20
15 15
Amplitude
Amplitude
10 10
5 5
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Time
25 25
(d) (e)
20 20
15 15
Amplitude
Amplitude
10 10
5 5
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Time
Figure 2. Denoising of the simulated CE signal (SNR= 15). (a) Pure CE signal. (b) Noisy CE signal. (c) Denoised CE signal using SG.
(d) Denoised CE signal using SWT-Bayes. (e) Denoised CE signal using the proposed method.
The relation between formulas (18) and (19) shows that the where Iˆ and I are the peak heights of the denoised signal and
smaller the value of RMSE, the larger the value of SNR, and the pure signal, respectively. Obviously, η is a simple strategy
the better the denoising performance. for the evaluation of the peak preservation of the algorithms.
In addition to the above evaluation measures, we also used As an optimal filter, the value of η should be infinitely close
a parameter, the ratio of the reconstructed signal peak height to 1. Otherwise, the signal will be distorted after denoising.
to the pure signal peak height (η), to evaluate the performance In our experiment, we chose two representative peaks marked
of peak preservation. It is defined as [18] with numbers to compute the values of η.
Iˆ The values of RMSE, SNR and η obtained by applying
η= , (20) all the methods to the noisy signal are listed in table 1. It
I
can be seen from table 1 that the proposed method provides
5
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
10000
9000
(a)
8000
7000
Amplitude
6000
5000
4000
3000
2000
10000 400
9000
(b) 300
(e)
8000 200
7000 100
Amplitude
Amplitude
6000 0
5000 −100
4000 −200
3000 −300
2000 −400
10000 400
9000
(c) 300
(f)
8000 200
7000 100
Amplitude
Amplitude
6000 0
5000 −100
4000 −200
3000 −300
2000 −400
9000
(d) 300
(g)
8000 200
7000 100
Amplitude
Amplitude
6000 0
5000 −100
4000 −200
3000 −300
2000 −400
Figure 3. Denoising of the real CE signal at the temperature of 45 ◦ C. (a) Real CE signal. (b) Denoised CE signal using SG. (c) Denoised
CE signal using SWT-Bayes. (d) Denoised CE signal using the proposed methods. (e) The difference between the real signal and denoised
signal using SG. ( f ) The difference between the real signal and denoised signal using SWT-Bayes. (g) The difference between the real
signal and denoised signal using the proposed method.
6
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
5000
(a)
4500
4000
Amplitude
3500
3000
2500
2000
0 500 1000 1500 2000
Time
5000 400
(e)
(b) 300
4500
200
4000 100
Amplitude
Amplitude
0
3500
−100
3000
−200
2500 −300
−400
2000
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Time
5000 400
(f)
(c) 300
4500
200
4000 100
Amplitude
Amplitude
0
3500
−100
3000
−200
2500 −300
−400
2000
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Time
5000 400
(g)
(d) 300
4500
200
4000 100
Amplitude
Amplitude
0
3500
−100
3000
−200
2500 −300
−400
2000
0 500 1000 1500 2000 0 500 1000 1500 2000
Time Time
Figure 4. Denoising of the real CE signal at the temperature of 60 ◦ C. (a) Real CE signal. (b) Denoised CE signal using SG. (c) Denoised
CE signal using SWT-Bayes. (d) Denoised CE signal using the proposed method. (e) The difference between the real signal and denoised
signal using SG. ( f ) The difference between the real signal and denoised signal using SWT-Bayes. (g) The difference between the real
signal and denoised signal using the proposed method.
7
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al
Table 1. The values of RMSE, SNR, η1 and η1 obtained by different proposed method demonstrates that noise on the CE signal
denoising methods applied to the pure CE signal with different noise is removed without distorting the peak shapes, and that the
levels.
characteristics of the pure CE signal are recovered perfectly.
Noise level Method RMSE SNR η1 η2 This seems to be consistent with the values of RMSE, SNR
SG 0.61 22.69 1.014 0.957 and η measures.
SNR = 5 SWT-Bayes 0.47 24.98 1.013 0.974
Proposed 0.46 25.14 1.002 0.989
5.2. Real CE examples
SG 0.57 23.28 1.034 0.975
SNR = 10 SWT-Bayes 0.41 25.85 1.031 1.000 Two real CE signals generated from separating DNA samples
Proposed 0.32 26.96 1.002 1.000
are shown in figures 3(a) and 4(a), respectively. To verify
SG 0.56 23.36 1.025 0.970 the validity of the proposed method for real CE signals, the
SNR = 15 SWT-Bayes 0.29 30.39 1.009 1.003
Proposed 0.23 31.06 0.992 1.001 wavelet, the moving window size and other settings are the
same as the preceding simulation experiment. Since noise-
SG 0.57 23.33 1.023 0.963
SNR = 20 SWT-Bayes 0.21 31.80 0.985 1.003 free signals are not available, the evaluation of the denoised
Proposed 0.22 31.76 0.989 1.000 results mainly depends on the visual comparison.
SG 0.51 23.37 1.022 0.964 The results obtained by the proposed method are
SNR = 30 SWT-Bayes 0.09 33.93 1.004 1.003 illustrated in figures 3(d) and 4(d), respectively. Form the
Proposed 0.12 33.16 0.997 0.999 figures, it can be seen that very good noise removal is
achieved for the two signals and that the shapes of peaks
and characteristics of the signal are preserved after denoising.
larger values of the SNR and smaller values of the RMSE
For the sake of comparison, the denoised results obtained by
in comparison to other methods in the case of strong noise
SWT-Bayes and SG are also shown in figures 3(b) and (c)
corruption. However, for weak noise, the SWT-Bayes filter
and 4(b) and (c). It can be observed clearly that less
is slightly superior to our proposed method. This is not
surprising; in the case of strong noise, peaks and noise effective performance of noise reduction is achieved by the
contribute the competitive detail coefficients and the estimate other two methods and a portion of noise still remains in
of threshold of SWT-Bayes slants small. Thus, a portion the reconstructed signals after filtering by SWT-Bayes and
of noise has not been removed in the denoised signal after SG methods. The shapes of peaks are slightly distorted
SWT-Bayes filtering. Conversely, for weak noise, the data- when the signals are processed by the SWT-Bayes method,
driven estimation accurately reflects the characteristics of especially for the overlapping peaks. To further evaluate the
the underlying signal, and thus achieved a slightly better performance of the proposed scheme, the error signals which
performance than the proposed algorithm. The values of η are the differences between the reconstructed signal and the
reported in table 1 also show that those obtained by our original signal are illustrated in figures 3(e)–(g) and 4(e)–(g),
proposed method are closer to 1 than those obtained by other respectively. It can be seen that the error signal obtained by the
methods at all noise level. It means that the peaks are well proposed method is more homogeneous than the others, and it
preserved in the process of noise removal. This is a benefit of indicates that only the noise is removed. It must be noticed that
the multiscale products distinguishing the underlying signal the error signal illustrated in figure 3( f ) demonstrates clearly
from noise well. Obviously, a similar denoising effect using the fact of insufficiency of noise removal.
wavelet thresholding and SG could not be achieved without
dramatic damage to the peaks. However, it should be noted
that the value of η1 belonging to the isolated peak as peak 1 is 6. Conclusion
quite different from the value η2 belonging to the overlapping
peak as peak 2 in the case of the SG method. This is because In this paper, a new method using multiscale products based
of the different contributions of the filter coefficients to the on the dyadic wavelet transform (WT) has been proposed to
estimated points. The values of η1 and η2 obtained by SWT- remove the noise of DNA CE signals. The performance of
Bayes are larger than 1 in most cases. This can be attributed the algorithm has been evaluated by performing experiments
to two aspects: one is that the small threshold results in a on both synthetic data and real DNA CE signals. According
portion of noise still remaining in the denoised signal; the to the obtained results, we found that our method is superior
other is the absence of incorporating the spatial dependence of to some of the existing methods in terms of RMSE, SNR
wavelet coefficients in the process of denoising operation and and η, especially in the case of strong noise corruption. The
some undesired artifacts are generated in the neighborhood of algorithm is more computationally demanding than the SG
discontinuities. and SWT-Bayes methods. This is mainly because of the use
For a visual comparison, a noisy signal with SNR = of multiplication of subband and the estimation of threshold.
15 and the corresponding denoised results are shown in Moreover, only the interscale dependence is considered in
figures 2(b) and (e), respectively. From figure 2(d), it can be this work; in fact, the dependence also exists in the intrascale
seen that a portion of noise is still remaining in the denoised subband. A more sophisticated approach is to consider the
signal and this corresponds to the values of η. Comparing interscale dependence on the denoising filter, and the related
with SWT-Bayes and SG, the denoised signal obtained by the work is being undertaken.
8
Meas. Sci. Technol. 24 (2013) 065004 Q Gao et al