Professional Documents
Culture Documents
Wavelet Transforms
The history of the WT can be traced back to Fourier theory. Fourier decomposition
expands a signal as an integral of sinusoidal oscillations over a range of frequencies
(Ng 2003). A major limitation of Fourier theory is in the loss of temporal
information in the Fourier transform. Wavelets were first introduced in 1987, as the
foundation of a powerful new approach to signal processing, called multiresolution
theory (Mallat 1999). Multiresolution theory incorporates and unifies techniques
from a variety of disciplines, including subband coding from signal processing,
quadrature mirror filtering from speech recognition, and pyramidal image pro-
cessing (Gonzalez and Woods 2002). Formally, a multiresolution analysis (MRA)
allows the representation of signals with their WT coefficients. The theory under-
lying MRA allows a systematic method for constructing (bi)orthogonal wavelets
(Daubechies 1988) and leads to the fast DWT, also known as Mallat’s pyramid
algorithm (Qian 2002; Mallat 1989). In practice, the DWT has been applied to many
different problems (Meyer 1990; Strang and Nguyen 1996; Daubechies 1992).
This chapter aims to review WTs with a link to Mallat’s pyramid algorithm and
from the multiresolution point of view. Wavelet denoising will be considered later
in this chapter. In Chaps. 7 and 10–12, wavelets will be used in the THz context for
feature extraction, image segmentation, and reconstruction.
Wavelet transforms are popular techniques suited to the analysis of very brief
signals, especially signals with sudden and unpredictable changes that often carry
the most interesting information (Qian 2002; Hubbard 1998). The WT technique is
particularly well suited to problems such as signal compression, feature extraction,
image enhancement, and noise removal, which are especially related to biomedical
applications. They complement the traditional Fourier-based techniques in THz
(2) Completeness: a function in the whole space comprises of the parts in each
subspace.
(a) Upward completeness of the subspaces
V j = L2 (R), (6.2)
(5) Basis-frame property: multiresolution schemes require a basis for each space
V j . That is, there exists a scaling function φ (t) ∈ V0
This means that for the same scale j, the scaling functions are orthonormal in
time shifts. It is also an important feature for every scaling function φ (t), that there
exists a set of coefficients {h(k)}, which satisfy the two-scale equation
√
φ (t) = 2 ∑ h(k)φ (2t − k). (6.9)
k
1 ω ω
Φ (ω ) = √ H Φ ; (6.10)
2 2 2
√ Φ(2ω )
H(ω ) = 2 . (6.11)
Φ(ω )
According to orthonormal MRA, the scaling function φ (t) in the frequency domain
can also be computed by iterating equation (6.10)
∞
H(ω /2 j )
Φ(ω ) = ∏ √ (6.12)
j=1 2
where f , φ
indicates the inner product between two integrable functions: f , φ
=
+∞ ∗
−∞ f (t)φ (t)dt, and ∗ indicates the complex conjugate.
Similarly, the detail operator Q j on function f (t) ∈ L2 (R) is defined by
and
lim ||Pj f (t)||2 = 0 (6.16)
j→∞
+∞
where || f ||2 = f , f
= [ −∞ f (t)2 ]1/2 .
Given a scaling function that meets the MRA requirements, a wavelet function
ψ can be built from translations of φ (2t)
√
ψ (t) = 2 ∑ g(k)φ (2t − k). (6.17)
k
Equations (6.9) and (6.17) show that both the wavelet function and the scaling
function are a weighted sum of scaling functions at the next finer scale. Let {V j } is
a multiscale analysis with scaling function φ (t) and scaling filter h(k), the wavelet
filter g(k) can be defined as
where h(k)∗ indicates the complex conjugate of h(k) in only the orthonormal case.
The Fourier transform of (6.19) is:
where denotes direct sum of vector spaces. Here, the vector spaces relate the
scaling and wavelet function subspaces. Since the orthonormal complement of V j in
V j−1 is W j , all members of V j are orthogonal to the members of W j . Thus,
where φ , ψ
indicates the inner product between two integrable functions. The set
ψ j,k (t) of wavelets is defined as
f (t) ∈ W j ⇒ f (t − k) ∈ W j ∀k ∈ Z. (6.26)
· · · ⊂ Vn+1 ⊂ Vn ⊂ · · · ⊂ V1 ⊂ V0 ⊂ · · · ⊂ L2 (R)
· · · ⊂ Ṽn+1 ⊂ Ṽn ⊂ · · · ⊂ Ṽ1 ⊂ Ṽ0 ⊂ · · · ⊂ L2 (R). (6.27)
and
√
ψ (t) = 2 ∑ g(k)φ (2t − k)
√
ψ̃ (t) = 2 ∑ g̃(k)φ̃ (2t − k) (6.30)
where
The scaling and wavelet functions are interrelated for the biorthogonal case:
φ̃ (t − k), φ (t − k)
= δk−1
ψ̃ (t − k), ψ (t − k)
= δk−1 (6.32)
φ̄ (t − k), ψ (t − k)
= 0
ψ̄ (t − k), φ (t − k)
= 0. (6.33)
f (t) = ∑ ∑ x, ψ̃ j,k
ψ j,k (t)
j k
= ∑ ∑ x, ψ j,k
ψ̃ j,k (t). (6.34)
j k
The equations in the frequency domain regarding biorthogonal wavelets are omitted
for brevity.
To sum up, linear phase filters are one of the most important aspects achieved
based on biorthogonality, which is vital in applications such as image compression,
because phase distortions result in highly undesirable visual artifacts. Perfect recon-
struction is another important property of orthogonal and biorthogonal wavelets,
which will be discussed in Sect. 6.2.3. The filter relations for the orthonormal and
biorthogonal wavelets as derived in this section form the basis for the development
of the DWT.
The WTs are based on wavelets, which are localized waves of varying frequency
and limited duration. In subband coding, a transformation is computed by filtering
and subsampling. The signal is separated approximately into frequency bands for
efficient decoding.
scaling subspaces and projecting the detailed information into wavelet subspaces,
as in
J0
f (t) = ∑ cJ0 (k)φJ0 ,k + ∑ ∑ d j (k)ψ ( j, k), (6.35)
k k j=0
where J0 is an arbitrary ending scale. The cJ0 (k)’s are called the approximation
or scaling coefficients, and the d j (k)’s are referred to as the detail or wavelet
coefficients. This is because (6.35) is composed of two parts: the first uses scaling
functions to provide an approximation of f (t) at scale J0 ; for each higher scale
j ≤ J0 , the second part is a finer resolution function, a sum of wavelets, which is
added to the approximation to provide increased detail. If the expansion functions
form an orthogonal basis, the expansion coefficients of the time series function can
be calculated as
cJ0 (k) = f (t), φJ0 ,k (t)
= f (t)φJ∗0 ,k (t)dt (6.36)
and
d j (k) = f (t), ψ j,k (t)
= f (t)ψ ∗j,k (t)dt. (6.37)
where φ (t)’s are scaling functions and ψ (t)’s are wavelet functions; f (t), φJ0 ,k (t),
and ψ j,k (t), all are functions of the discrete variable t = 0, 1, 2, . . . , M − 1; j is the
index of scale in the range {0, J0 }, k is the time shift factor, and M is usually selected
as a power of 2 (i.e., M = 2 j ). The transform is composed from M coefficients, and
the starting scale is 0, and the ending scale is J0 . For the biorthogonal case, the
functions φ and ψ should be replaced by their duals, φ̃ and ψ̃ .
Since biorthogonal bases consist of the dual pairs of φ and ψ , care should be
taken that, if the expansion functions are the dual terms, the functions φ and ψ
should be replaced in these equations by their dual functions φ̃ and ψ̃ .
b
cj (k)
˜
h(k)
Given an orthogonal wavelet basis, the signal f (t) can be represented by scaling
series coefficients {c j0 (k)} and wavelet series coefficients {d j0 (k)}, where J0 is
a desired WT ending scale. Mallat’s algorithm (or FWT), states the coefficients
{c j (k)}0j0 and {d j (k)}J00 can be calculated from coefficients at lower scales, by
cascade filtering and subsampling with a pair of filters. The pair of filters consists of
low-pass filter H(ω ) and high-pass filter G(ω ), which form a pair of mirror filters.
The low-pass and high-pass filters H(ω ) and G(ω ) are the frequency responses
of predefined filter coefficients {h(k)} and {g(k)}, which characterize the scaling
function and wavelet functions, respectively (Guo et al. 2001). The coarse and
detailed coefficients at scale j − 1 are calculated by applying the data at scale j
and wavelet filtered banks as follows:
c j−1 (k) = ∑ c j (m) φ j,m , φ j−1,k
= ∑ h(m − 2k)c j (m)
m m
c(J0-1) d(J0-1)
m m
where h̃ and g̃ indicate the reconstruction low- and high-pass filters, respectively. For
orthogonal wavelets, the reconstruction filters are simply time-reversed versions of
the decomposition filters, that is, h(k) = h̃(−k) and g(k) = g̃(−k). In other words, h̃
and g̃ are often delayed versions of h(−k) and g(−k) to keep them causal. The net
effect is a finite latency through the system (analysis plus synthesis).
The one-step reconstruction represented in (6.40) can be iteratively applied to
multiple levels of wavelet reconstruction, until the original signal is recovered.
Each reconstruction stage adds the detail subbands d j (k) back to the coarse ap-
proximation subband c j (k) to form the finer approximation c j+1 (k). Each synthesis
(reconstruction) stage needs interpolation of each subband by 2, followed by
filtering with the synthesis filters, h̃(k) and g̃(k), then summing up the two sets
of resultant coefficients. This synthesis processing is illustrated in Fig. 6.1b. These
operations are repeated over all levels to reconstruct the original signal c0 (k). The
entire analysis process in a J-level FWT is illustrated in Fig. 6.2. The number of
computations is halved with an increase in the level of the transform, as a direct
consequence of the reduced data from downsampling. This reduction in complexity
with levels leads to the high efficiency of the algorithm. The FWT of a length
M = 2J0 sequence has an overall computational complexity of O(MlogM). Since
the analysis and synthesis stages employ two filters (low- and high-pass), the FWT
system is commonly called a two-channel filter bank in the engineering literature.
As for the practical application, the sampled T-ray transients are a function of
discrete time, to which the DWT can be applied. To realize the DWT, simple digital
82 6 Wavelet Transforms
jσk
ΨK0, J0 (ω)
J0 σk
η σω
J0 J0
Ψk, j ΨK0, J0
t
k K0
A filter bank is a set of filters that mainly link to sampling operations. In a two-
channel filter bank, the analysis filters are low-pass H(z) and high-pass G(z), where
H(z) and G(z) in the frequency domain are z-transforms of h(k) and g(k) in the time
domain. The synthesis bank for signal reconstruction performs the inverse procedure
to analysis, which means low-pass H̃(z) and high-pass G̃(z) filtering (z-transforms
of h̃(k) and g̃(k)), and interpolating. In order to recover the discarded information by
the downsampling in the analysis bank, the synthesis filter bank must be specifically
adapted to the analysis filters for perfect reconstruction. This means that the filter
bank is biorthogonal. An orthonormal filter bank that the synthesis is the transpose
of the analysis, is a specific case of the more general biorthogonal filter bank.
6.3 Two-Dimensional Discrete Wavelet Transforms 83
There are two conditions that need to be imposed to achieve perfect reconstruc-
tion. The first condition is to remove aliasing in the reconstructed signal, which
leads to as follows:
H̃(z)H(−z) + G̃(z)G(−z) = 0. (6.41)
where G̃(z) is a function of H(z) and G(z) is a function of H̃(z). Note that for
(6.41), if we define S(z) = H(z)H̃(z), then this equation becomes S(z) + S(−z) = 2.
This implies that s(k) is a half-band filter, with zeros for all its even coefficients
excluding s(0) = 1. The odd coefficients are then free design variables. In practice,
the wavelet filters are usually derived by first designing a particular half-band filter
s(k). The analysis and synthesis low-pass filters are then obtained by applying
spectral factorization to s(k).
The orthonormality condition for perfect reconstruction filter banks is defined in
the time domain as
gi (n), gi (n + 2m)
= δ (i − j)δ (m), i, j = {0, 1}. (6.43)
It is useful to note that the biorthogonality condition holds for all two-band, real-
coefficient perfect reconstruction filter banks.
It can be observed via the previous analysis that despite their central roles in the
development of wavelet theory, the FWT algorithm does not apply the scaling and
wavelet function for time series analysis. Instead, it is only necessary to explicitly
construct the analysis and reconstruction filters { h(k), g(k), h̃(k), g̃(k) } in order to
compute the WT coefficients.
The drawback of the FWT is the different filter lengths in the decomposition part,
caused by an odd sample shift in the input signal in the downsampling operations. It
means that the signal spectrum is unevenly distributed over the low-band and high-
band segments. In many practical situations, this shift variance is an undesirable
aspect of the FWT. Researchers have spent significant efforts in overcoming this
problem, leading to a variety of different approaches (Ng 2003).
calculated by taking the 1D WT along the rows of f (x, y) and the resulting columns
(Gonzalez and Woods 2002). The 2D scaling function and 2D wavelet functions
satisfy the following equations:
where φ (t) and ψ (t) are the one-dimensional scaling and wavelet functions, re-
spectively. There exists the sequences c j−1 (k, l), d Hj−1 (k, l), d Vj−1 (k, l), and d Dj−1 (k, l)
given by the following equations:
⎧
⎪
⎪ c j−1 (k, l) = f (x, y), φ j (k, l)
⎪
⎪
⎨d H (k, l) = f (x, y), ψ H (k, l)
j−1 j
(6.49)
⎪
⎪ d V (k, l) = f (x, y), ψ V (k, l)
⎪
⎪ j−1 j
⎩ D
d j−1 (k, l) = f (x, y), ψ Dj (k, l)
.
The coefficients d Hj−1 , d Vj−1 and d Dj−1 correspond to horizontal, vertical, and diagonal
high-frequency information, respectively, while c j−1 corresponds to coefficients
representing low-frequency information (Mallat 1999).
The 2D WT algorithm can be derived from (6.49). Simply, the 2D DWT can be
calculated by cascaded 1D DWTs along the rows and columns of the image (Mallat
1999). This process can be repeated a number of times to yield successively lower
resolution images. Mathematically, the process of taking one level of the 2D WT is:
⎧
⎪
⎪ c j−1 (k, l) = ∑m,n h(2k − m)h(2l − n)c j (m, n)
⎪
⎪
⎨d H (k, l) =
j−1 ∑m,n h(2k − m)g(2l − n)c j (m, n)
(6.50)
⎪
⎪ d j−1 (k, l) = ∑m,n g(2k − m)h(2l − n)c j (m, n)
V
⎪
⎪
⎩ D
d j−1 (k, l) = ∑m,n g(2k − m)g(2l − n)c j (m, n).
Let h and g denote a pair of linear phase low- and high-pass wavelet filters and
h̃ andg̃ denote the corresponding reconstruction filters. The discrete approximation
at resolution 2 j can be obtained by combining the details and approximation at
resolution 2 j−1 using reconstructed wavelet filters:
COLUMNS
h 2 1 LL Cj 1 2 ~
h
Cj - ROWS
h 2 1 ~
1 2 h
g~
COLUMNS
…… (h)
g 2 1 LH Dj 1 2
IMAGE
ROWS
INPUT
……
Fig. 6.4 Illustration of the 2D discrete wavelet transform procedure The 2D DWT can be realized
via digital filters and downsampling the T-ray image. A 2D scaling function, φ (x, y), and three 2D
wavelets, ψ H (x, y), ψ V (x, y), and ψ D (x, y), are calculated via taking the 1D fast wavelet transform
of the rows of f (x, y) and the resulting columns. The discrete approximation at resolution 2 j can be
obtained by combination of the details and approximation at resolution 2 j−1 using reconstructed
wavelet filters. Here, h and g and h̃ and g̃ indicate the low-pass and high-pass wavelet filters
and reconstruction filters, respectively. The down and up arrows indicate the downsampling and
upsampling procedures
Figure 6.4 shows the block diagram of the 2D WT procedure. The block diagram
is for one step of the 2D discrete wavelet transform (2D DWT) and can be
implemented for recovery of an approximation at resolution of one step.
Discrete wavelet packet transforms (DWPTs) are the link between multiresolution
approximations and discrete wavelets (Mallat 1999). The detailed subspace W j is
spanned can be calculated by the discrete wavelet basis ψ j (2 j t − k). The time and
frequency spread (window) of the discrete wavelet basis ψ j (2 j t − k) are respectively
proportional to j and 1/ j. With the increase of the scale j, the width of the time–
frequency window of the wavelet basis has decreased, while its height is increased.
This indicates that wavelet basis has contracted in its space localization, but with an
expansion in the frequency localization. In order to overcome this problem, we need
to decompose W j further. DWPTs apply the DWT step to both the lower resolution
and detail subspaces. This procedure is fulfilled by dividing the orthogonal basis
{φ j−1 (2 j t − k)}k∈z into two new orthogonal bases: {φ j (2 j t − k)}k∈z of V j and
{ψ j (2 j t − k)}k∈z of W j , we obtain
where
√
φ (t) = 2 ∑ h(k)φ (2t − k) (6.52)
k∈z
√
ψ (t) = 2 ∑ g(k)φ (2t − k). (6.53)
k∈z
86 6 Wavelet Transforms
h g h g
μ2 μ3
μ0 μ1
h g h g
h g h g
For the analysis of the wavelet packet function, new signs μ0 (t) and μ1 (t) are used
instead of φ (t) and ψ (t):
√
μ0 (t) = 2 ∑ h(k)μ0 (2t − k) (6.54)
k∈z
√
μ1 (t) = 2 ∑ g(k)μ0 (2t − k). (6.55)
k∈z
For a fixed decomposing scale, the wavelet packet function can be defined by μ0 (t),
μ1 (t), h, and g:
√
μ2n (t) = 2 ∑ h(k)μn (2t − k) (6.56)
k
√
μ2n+1 (t) = 2 ∑ g(k)μn (2t − k) (6.57)
k
where
1
m0 (ω ) = √ ∑ h(k)e−ikω
2 k
1
m1 (ω ) = √ ∑ g(k)e−ikω , (6.60)
2 k
μn (t − j), μn (t − k)
= δ j,k , j, k ∈ z. (6.61)
U j0 = V j
U j1 = W j , j∈z (6.63)
to
0
U j−1 = U j0 U j1 , j ∈ z. (6.65)
For ∀n ∈ z+ , we obtain
n
U j−1 = U j2n U j2n+1 , j ∈ z, (6.66)
The root node at j = 0 can be viewed as the approximation of the sampled signal
f (t), which can be represented by the discrete wavelet packet coefficients, noted as
d00 (k).
Similarly, the discrete wavelet packet coefficients for projections onto U j2n and
2n+1 2n+1
Uj can be noted as d 2n
j and d j , respectively. Based on the discrete wavelet
packet definition, (6.66), we can achieve the discrete wavelet packet decomposition
and reconstruction algorithms, at the decomposition:
j−1 (k) = ∑l∈z h(l − 2k)d j (l)
d 2n n
Here, h̃ and g̃ are the reconstruction low- and high-pass filters, respectively.
2n+1
The discrete wavelet package coefficients d 2n j−1 and d j−1 are the subsampled
convolutions of d j with h and g, and d j−1 (k) satisfies
n n
Here, j is the node depth in the wavelet packet binary tree, and n denotes the number
of the note at the same node depth. Two scales of the wavelet packet tree in Fig. 6.6
result in almost twice the number of decompositions and reconstructions that are
available from the two-scale DWT.
Different wavelet packet bases show different capabilities to achieve
time–frequency localization and represent different properties of signals. It is
important to select a best wavelet packet basis from the dictionary of bases. A few
questions arise: how to select the best wavelet packet basis? What are the criteria for
evaluating the adaptability of a wavelet packet basis? How to make a quick search
within the dictionary for the best basis?
Wavelet packet bases are large families of orthogonal bases that include different
types of time–frequency atoms. The best wavelet packet basis decomposes signals
over atoms that are adapted to the signal’s time–frequency features. To evaluate the
best basis, first we perform wavelet series spread of the signal f (t) depending on an
orthonormal wavelet packet basis. It turns out to be wavelet packet coefficient series
E = {Ek } corresponding to f (t). We define an information cost function W on the
series, and it should satisfy two conditions:
(1) The additivity
W ({Ek }) = ∑ W (Ek ), W (0) = 0. (6.71)
k∈z
6.4 Discrete Wavelet Packet Transforms 89
h 2 d 20
d 10
h 2
g 2 d 21
d 00
d 22
h 2
d 11
g 2
g 2 d 23
d 20 2 ˜
h d 10
2 h˜
d 21 2 g̃
d 00
d 22 2
h̃ d 11
2 g̃
d 23 2 g˜
Fig. 6.6 The procedure of discrete wavelet packet transform. Illustration of the two-level discrete
wavelet packet decompositions. Here, h̃ and g̃ are the reconstruction low- and high-pass filters,
2n+1
respectively. The discrete wavelet package coefficients d 2n j−1 and d j−1 are the subsampled
convolutions of d nj with h and g, the low-pass and high-pass wavelet filters, respectively
(2) The value of the information cost function W should reflect the concentration
degree of wavelet packet coefficients. It can be explained as follows:
a. If the energy of the coefficient series {Ek } is concentrated to a few
coefficients and the absolute value of a large number of the coefficients is
small enough to be ignored, the corresponding basis is considered to be
better, as the value of the information cost function should be smaller.
b. If the energy of coefficient series {Ek } is distributed uniformly, the corre-
spondent basis is not desirable, and the value of the information cost function
W should be larger. Generally, the wavelet packet basis B of L2 (R) for
effectively approximating a signal f (t) should be optimized by minimizing
the knowledge of an information cost function.
90 6 Wavelet Transforms
(2) The concentration degree of the lp norm of the wavelet packet coefficients.
For ∀0 < p < 2, we define W (Ek ) p = |Ek | p ; therefore, W (E) = ∑k∈z |Ek | p =
1
||E|| p . Usually, the L p norm is defined as ||x|| p = (∑ xip ) p .
(3) Logarithmic entropy.
For ∀E = {Ek }, we define W (E) = ∑k∈z log|Ek |2 , and log0 = 0.
(4) Information entropy.
For ∀E = {Ek }, we define W (E) = − ∑k∈z |Ek |2 log|Ek |2 , and log0 = 0.
A basis pursuit is computationally expensive because it minimizes a global cost
function over all dictionary vectors. Here, we introduce a dynamic programming
approach. Further discussion on the basis pursuit, that is, matching pursuit, can be
found in Mallat (1999).
The dynamic programming calculation considers a discrete wavelet packet
decomposition in space V0 = U0 . This decomposition can be presented using a dual
tree, as shown in Fig. 6.6. The wavelet expansion of f (t) produces wavelet packet
coefficients, which are used to calculate the value of information cost function at
each node of the dual tree. Finally, we trace the best basis node by node from bottom
to top of the tree. The operation is as follows:
(1) We fix the maximum depth d allowed for the tree and initialize the search with
a complete tree with that depth.
(2) All nodes are candidates to be leaf nodes. For each node (i, j), leaf or non-leaf,
we compare the value of cost function Wf at the parent node with the sum of the
cost functions Wb at its two child nodes.
(3) Initialize the costs at level d, the deepest level of the tree. If one node of this
level is kept in the tree, it will be necessarily a leaf node. This step is to first
label the bottom node via the notation “∗.”
(4) If Wf ≤ Wb , we label the parent node with “∗,” to emphasize that the cost value
at the parent node is not changed, which is viewed as a leaf node. If Wf > Wb ,
we bracket the value Wf and put the value Wb beside the bracket. In this case,
the corresponding nodes are viewed as non-leaf nodes.
(5) We consider only the value outside the bracket and repeat the first and second
steps till the top layer of the tree.
(6) The final step is to select the labeled nodes that are close to the root.
The procedure is performed from top to bottom of the whole tree (any subnodes,
6.5 Wavelet Denoising for THz-TDS Pulses via the Heuristic SURE Threshold 91
30 (50)
10 (20) 22*
1* 2* 3* 4* 5* 6* 7* 8*
Fig. 6.7 A quick searching of the best bases. The nodes with pink shade indicate the best wavelet
packet bases. The number at the bottom row (the maximum depth) is supposed to be given, which
is indicated by 1–8 digital numbers. We compare the value of cost function at the parent node Wf
with the sum of the cost functions at its two child nodes Wb . If Wf ≤ Wb , we label the parent node
with “∗,” viewed as a leaf node. If Wf > Wb , we bracket the value Wf and put the value Wb beside
the bracket. In this case, the corresponding nodes are viewed as non-leaf. We consider only the
value outside the bracket and repeat the first and second steps till the top layer of the tree
including the nodes, with and without an ∗, are not considered again). These
selected nodes cover the space V0 = U00 without overlapping, all of which are
viewed as the nodes corresponding to the best bases. Figure 6.7 illustrates
the linear searching procedure regarding the best wavelet packet bases.
as well as that of Strang and Nguyen (1996) further complements the above,
elaborating more on subband coding; Sherlock and Monro (1998) discuss how to
apply FIR filters of arbitrary length to describe the space of orthonormal wavelets,
further parameterizing the wavelet coefficients at each decomposed level; and Tuqun
and Vaidyanathan (2000) propose a state-space approach to the design of globally
optimal FIR energy compaction filters.
Since, in our aim, there is no requirement for adopting an algorithm with a perfect
reconstruction property, as our ultimate goal is feature extraction and classification,
our constrains are more relaxed compared to those used in filtering or signal
compression applications.
Divine and Godtliebsen (2007) suggest that for feature exploration purposes, it
is possible to assume stationarity over some time interval and smooth the wavelet
spectrum along the time axis using an AR model. Paiva and Galvão (2006) also
discuss a wavelet packet decomposition tree algorithm that establishes frequency
bands where subband models are created. Both approaches propose the modeling of
the approximation and detail wavelet coefficients in order to further extract statisti-
cally significant features, and a similar approach is adopted in our work. A typical
denoising procedure consists of decomposing the original signal using DWPT or
DWT (Mallat 1999; Daubechies 1992; Jensen and la Cour-Harbo 2001; Percival
and Walden 2000), thresholding the detail coefficients, and reconstructing the signal
by applying the appropriate inverse transform (IDWT or IDWPT, respectively). For
the denoising of femtosecond THz transients, a three-level decomposition is usually
sufficient (Hadjiloucas et al. 2004), and unnecessary computational load associated
with more decomposition levels can be avoided. Stein’s unbiased risk estimate
(SURE) and the “heuristic” SURE methods (Donoho 1995) are used separately to
estimate the soft threshold parameter (λ S ) for classification experiments.
In our T-ray project, each time-domain measurement corresponding to data from
a single pixel is represented by a data vector x of length Lx , where the nth element
of x, denoted by xn , represents the measured signal at the nth sampling instant. The
filter bank transform can be regarded as a change in variables from R Lx to R Lx
performed according to the following operation:
Lx −1
wm = ∑ xn vm (n), m = 0, 1, . . . , Lx − 1, (6.73)
n=0
with coefficients in larger scales (e.g., d j ,d j−1 ,d j−1 ,. . . ) associated with broad
features in the data vector, and coefficients in smaller scales (e.g., d1 ,d2 ,d3 ,. . . )
associated with narrower features such as sharp peaks. Let h[0], h[1], . . ., h[2N − 1]
and g[0], g[1], . . ., g[2N − 1] be the impulse responses of the low-pass and high-
pass filters H and G, respectively. Assuming that filtering is carried out by circular
convolution, the procedure for generating the approximation coefficients from the
data vector x consists of flipping the filtering sequence and moving it alongside the
data vector. For each position of the filtering sequence regarding the data vector, the
scalar product of the two is calculated. Dyadic downsampling is then performed to
generate coefficients c[i]. The detail coefficients d[i] are obtained in a similar manner
by using the high-pass filtering sequence. Filtering in the wavelet domain consists
of changing some of the above elements of w by applying soft thresholding so that a
new vector wf is produced and then applying the inverse transform. A soft threshold
operation with threshold λ s is employed:
By applying Stein’s results (Donoho 1995; Johnstone and Donoho 1995) at the
jth decomposition level, we have
n
SURE j (λ js ; d j ) = n − 2 · #{i : |d j [i]| < λ js } + ∑ min(|d j [i]|, λ js )2 , (6.78)
i=1
where μ̂ j denotes the soft threshold estimator μ̂ j [i](λ j ) (d j ) = ηλs j (d j [i]). Thus the
s