You are on page 1of 75

Introduction

to
Wavelets
Preface
About the Book
This book represents an attempt to present the most basic concepts of wavelet analysis,
in addition to the most relevant applications. Compared to the half a dozen or so in-
troductory books on the subject, this book is designed, with its very detailed treatment,
to serve the undergraduates and newcomers to the subject, assuming a general calculus
preparation and what comes with it of matrix computations. The essential subjects
needed for wavelet analysis, namely, vector spaces and Fourier series and integrals, are
presented in a simple way to suit such a background, or the background of those that
have been away for a while from such subjects. It is a challenging task, and this book
is an attempt at meeting such a challenge.
The author has consulted, primarily, half a dozen basic introductory books of Nev-
ergelt [1], Bogges and Narcowich [2], Walker [3], Mix and Olejniczak [4], Walter [5],
Aboufadel and Schlicker [6], Hubbard [7], Burrus et al [8], and, of course, the standard
reference of Daubechies [9]. This is besides other books, papers, and tutorials on the
subject with due thanks and much appreciation for their demonstrating what is impor-
tant for an introductory book. The present book goes beyond their treatments by adding
all the details necessary, for an undergraduate in science and engineering or others new
to the subject, to make the text applicable to using wavelets for their varied purposes.
It is hoped that this attempt meets the needs of students as well as their instructors to
have a very readable book on wavelets, even with basically only a calculus background.
xv
xvi PREFACE
Special Features of this Wavelet Book
Compared to all books on wavelets, this book comes with detailed or partial solutions to
almost all the exercises, which are found in the Students SolutionManual(ISBN:0967330-
0-8) to accompany the book.
Audience and Suggestions for Course Adaption
The book may be used for a one semester, using a selection of sections from Chapters
1-6, and covering Chapters 9-12. The whole book makes a good one year course.
Abdul J. Jerri
Potsdam, New York
April 2011
1
Introduction
This text, written on the relatively new subject of wavelets, is an attempt to introduce
wavelets to undergraduate students, newcomers to the eld, and other interested read-
ers. The goal is to do this with basic calculus tools. This is not an easy task since
the topic of wavelets requires a higher level of mathematical background, which is not
covered in basic calculus courses. It is from almost all books written on this subject to
date, and even those claiming to be aimed at an undergraduate level, that a minimum
of such additional requirements would be basic elements of linear algebra and Fourier
analysis. In addition, for the practical implementation of wavelet decomposition and
signal reconstruction a rudimentary exposure to signal analysis, namely, ltering, is
vital for completing the subject of wavelets as a practical, fast, and efcient algorithm.
The latter subject of ltering signals, fortunately, can be covered within the applications
of the Fourier transform, the subject of Chapter 6.
Doing justice to the necessary topics would move this book to the level of the intro-
ductory" texts that preceded it, which is not within our aim of the simplest book for the
undergraduate student in engineering and science, as well as the interested practitioners.
So, remaining true to our purpose, we decided to formally" introduce the essentials
of the above necessary requirements in the simplest and most illustrative way. This
approach, we hope, will make it accessible to the rst-time reader of the new topic of
wavelets.
In doing so, we will begin with the relevant basic ideas found in calculus courses
and build on them for introducing the above two basic requirements of Fourier analysis
and linear algebra. This may mean that our presentation will be necessarily elementary
and a formal one. However, we will attempt to support it by quoting the most basic
mathematical results and theorems in a very clear manner. As such, we will not be able
1
2 Chapter 1 INTRODUCTION
to present the proofs of all such results.
Since wavelet analysis may be considered superior" for representing signals or
functions, we shall review the familiar ways of expressing functions in terms of much
simpler ones in Chapter 2, e.g., f(x) = e
x
in terms of the basic simple monomials
{1, x, x
2
, . . .}. Another example is expressing the two-dimensional force

F in terms of
the simpler unit vectors

i and

j,

F =

iF
x
+

jF
y
. Speaking of vectors, we will recall
the basic elements of matrices as needed, since one way of representing a vector is by
a column (or row) matrix.
Another most basic representation of signals in engineering and applied mathematics
is that of periodic functions in terms of the very familiar harmonic functions sin nx and
cos nx on (, ). This constitutes the Fourier series branch of Fourier analysis, which
we will visit briey in Chapter 2 and with more detail in Chapter 5. Readers familiar
with these topics may just as well skip them or scan them quickly.
Wavelets may be seen as small waves (t), which oscillate at least a few times, but
unlike the harmonic waves must die out to zero as t . The most applicable
wavelets are those that die out to identically zero after a few oscillations on a nite
interval [a, b), i.e., (t) = 0 outside the interval [a, b).
Such a special interval [a, b) is called the support" or compact support" of the
given (basic) wavelet (t). We say a basic wavelet since it will be equipped with two
parameters, namely, scale" and translation" to result in a family" of wavelets
(
t

). An example of a basic wavelet is the Daubechies 2 wavelet (t) shown in


Figure 1.1.
1 0 1 2 3 4
1.5
1
0.5
0
0.5
1
1.5
2
t
(t)
(t1)
Fig. 1.1: The Daubechies 2 wavelet (t) and its associated scaling function (t), < t < . (Note
that here (t) is shifted to the right by 1 as (t 1) to have its compact support coincide with that of its
associated scaling function.) This advanced looking result wil be reached in a simple detached steps in
Example 3.8 of Chapter 3.
3
The construction of basic wavelets is established in terms of their associated build-
ing blocks" or scaling functions" (t). The latter is governed by an equation called
the recurrence relation" or scaling equation." We should mention here that there is
something relatively new about this fundamental equation. It comes with unknown
coefcients, whose determination constitutes a signicant part of this book. Some very
basic theorems will lead us in this direction with the aid of the Fourier transform.
The Daubechies 2 scaling function (t) is shown in Figure 1.1, where we note that
both (t) and (t) in this case vanish identically to zero beyond their respective nite
intervals (or supports). In Figure 1.2 we show the oldest known basic wavelet, the Haar
wavelet, which dates to 1910:
(t) =

1, 0 t <
1
2
1,
1
2
t < 1
0, otherwise
(1.1)
and its associated scaling function,
(t) =

1, 0 t < 1
0, otherwise
(1.2)
with their (common) compact support [0,1).
0.5 0 0.5 1 1.5
1
0.5
0
0.5
1
t
(t)
0.5 0 0.5 1 1.5
1
0.5
0
0.5
1
t
(t)
Fig. 1.2: The Haar basic wavelet (t) and its associated scaling function (t), < t < .
In Section 3.3 of Chapter 3 we will present a few other wavelets and their associ-
ated scaling functions. The emphasis will be on the successive approximation iterative
method of solving the recurrence or scaling equation to nd the scaling function. Once
that has been accomplished, constructing the associated basic wavelet amounts to a
rather simple computation.
This brings us to the most relevant question in wavelet analysis: what is the source
of the scaling equation? Here, we have a relatively new subject compared to our usual
4 Chapter 1 INTRODUCTION
source of constructing the different familiar functions, differential equations. The new
subject is that of Multiresolution Analysis (MRA), where it depends mainly on the new
parameter of the scale that is introduced to the basic wavelet as (
t

), which will be
the subject of Chapter 8. Its denition depends on the topic of vector spaces and sub
(vector) spaces. This is a relatively new subject to readers with calculus background.
However, we will attempt in Chapter 4 to introduce it gradually, building on the fa-
miliarity of the usual vectors in two and three dimensions. In fact, such vectors, with
their binary operation of addition and multiplication by a scalar, constitute one example
of a vector space. The same two operations on vectors in n-dimensions are used for
the denition of the n-dimensional vector space. We will then move to real-valued
functions with their very familiar binary operation of addition and multiplication by a
scalar to show that they also constitute a vector space. In the same chapter we will
discuss the basis for such vector spaces and introduce the inner product among their
members, to reach the new subject of projecting a member of (the general) vector space
onto another. Finally, we will introduce the very important subject of orthogonality"
among members of the (desired) vector space, or members of different vector spaces.
After preparing for the vector spaces, the subject of Multiresolution Analysis will
be covered in Chapter 8, following the coverage of the Fourier series and transforms
in Chapters 5 and 6, respectively. In Chapter 7 we will briey present the windowed
(or short term) Fourier transform(WSFT) and the continuous wavelet transform(CWT).
Wavelet analysis is the benefactor of Fourier Analysis, in the sense that it answers
the latters serious shortcomings, namely the fact that Fourier analysis represents a
signal as either a function of time or frequency, but not both. There are also other new
advantages of wavelet analysis to be discussed throughout the book. Thus, discussing
or showing some wavelets is incomplete without comparing them with the harmonic
sine and cosine waves of Fourier series. This is the reason behind having some prelim-
inaries of the latter subject in Section 2.2 of Chapter 2, as we intend at this very early
stage to compare Fourier and wavelet analysis, and highlight the advantages of the latter.
As we shall see in Section 2.2, the Fourier series decomposes a given signal, e.g.,
f(t), < t < , in terms of the harmonic functions {1, cos nt, sin nt}

n=1
on (, )
with their (time-independent) frequency n in radians/sec (or period 2/n seconds). In
contrast to the wavelets, which die out after few oscillations on their compact supports,
the above harmonic functions continue to repeat their shape (periodic) on the whole
real line with (constant) period 2/n. In Figure 1.3 we illustrate such periodicity for
f(t) = sin t and g(t) = cos t with frequency n = 1 (period 2), and in Figure 1.4 we
illustrate f(t) = sin 2t and g(t) = cos 2t with their higher frequency n = 2 (period
2

n
=
2
2
= ).
2.1 TAYLOR SERIES REPRESENTATION OF FUNCTIONS 23
signal if we take f(t) = i(t) as the current owing in a wire of unit resistance. The
power in the t time interval is |f(t)|
2
t with the total power (for the whole time) as

|f(t)|
2
dt. Such a class of signals with nite power are termed square integrable
functions. The mathematical symbol for such set of functions is L
2
(, ), where
the 2 in L
2
refers to the square of the function in the above integral. L refers to a
more general aspect of integration, i.e., the integration in the sense of Lebesgue as we
will elaborate on briey at the end of Section 2.2.3. In our treatment we shall conne
ourselves to the integration used in calculus courses, which is the Riemann type of
integration. Hence, we take lim
n
V
n
as f(t) L
2
(, ).
Combining the result of the nested subsets of the dyadic
1
2
n
scale in (2.14), and
lim
n
V
n
= {0}, lim
n
V
n
= L
2
(, ), we have the idea of nested subsets
concept that we will need for Multiresolution Analysis,
{0} V
n
V
2
V
1
V
0
V
1
V
2
V
n
L
2
(, ). (2.15)
Here, we try to introduce the concept of nested subsets that are characterized by their
decreasing scale
1
2
n
. For Multiresolution Analysis we will need nested sub (vector)
spaces," of which the above nested subsets are an example. We will present and dene
vector spaces, subspaces, and their bases in Chapter 4.
The next example, with its illustrations, may clarify further this vital concept of
nested subsets for wavelet analysis with its dyadic scale.
Example 2.1 Discretizing with a Dyadic Scale
Consider the function f(t) = sin

8
t, t (0, 8) and its discrete values f(t
k
) = f(
k
2
j
)
with their spacing t =
k+1
2
j

k
2
j
=
1
2
j
. Let us take the case of j = 0, where t = 1,
as shown in Figure 2.3.
We note how crude the approximation of sin

8
t is with the eight sample points and
their spacing of t =
1
2
0
= 1. The scaling analysis of wavelet spaces the samples at
t
j,k
=
k
2
j
, so the incremental scale used is t
j,k
=
k+1
2
j

k
2
j
=
1
2
j
. This, we note, will
decrease very fast as we increase j, as shown in Figures 2.4 and 2.5 for t =
1
2
1
=
1
2
and t =
1
2
2
=
1
4
, respectively. It is in contrast to the manner in which we usually sam-
ple in basic calculus with n points on the interval (a, b), where the constant increment
(scale) is t =
ba
n
, which decreases as
1
n
instead of the (fast) dyadic
1
2
n
of wavelets.
Let us denote the set of signals discretized with scale of
n
=
1
2
n
as V
n
. Thus, what
we have in Figure 2.3 is an example of a signal discretized with a scale of
1
2
0
= 1,
which are elements of V
0
. As seen, it is a crude or coarse approximation of the sin

8
t
signal. If we decrease the scale to
1
2
1
=
1
2
, we have a more rened approximation of
the signal, and all signals done that way with scale
1
2
1
constitute the set V
1
. This better
24 Chapter 2 REVIEW OF BASIC REPRESENTATIONS OF FUNCTIONS
approximation (with half the scale) is shown for f(t) = sin

8
t in Figure 2.4. The next,
more rened scale is
1
2
2
=
1
4
, shown in Figure 2.5, a better approximation than that of
Figures 2.3 to 2.4 (with their respective scales of 1 and
1
2
). The result is the set V
2
of
the discretized signals. It is clear that the details in V
0
are found in V
1
, and in turn the
details of the latter are found in V
2
. This enables us to write the three sets with their
nested subsets relation as V
0
V
1
V
2
. As we did above, we can still decrease the
scale to
1
2
n
for n > 2 or increase it as
1
2
n
= 2
n
, n > 0 with their corresponding sets
V
n
and V
n
, where, following the above observation for V
0
, V
1
, and V
2
, we can write
V
n
V
2
V
1
V
0
V
1
V
2
V
n
.
0 1 2 3 4 5 6 7 8
0
0.2
0.4
0.6
0.8
1
t
sin(/8t)
Approx, t=1
Fig. 2.3: f(t) = sin

8
t and its dyadic discretization with a scale of t =
1
2
0
= 1.
0 1 2 3 4 5 6 7 8
0
0.2
0.4
0.6
0.8
1
t
sin(/8t)
Approx, t=1/2
Fig. 2.4: f(t) = sin

8
t and its dyadic discretization with a scale of t =
1
2
1
=
1
2
.
2.1 TAYLOR SERIES REPRESENTATION OF FUNCTIONS 25
0 1 2 3 4 5 6 7 8
0
0.2
0.4
0.6
0.8
1
t
sin(/8t)
Approx, t=1/4
Fig. 2.5: f(t) = sin

8
t and its discretization with a scale of t =
1
2
2
=
1
4
.
Exercises 2.1
Note: These exercises are also meant for a review of the basic function approximation
of Taylor series, plus the important concept of nested subspaces.
1. Solve the differential equation
d
2
y
dx
2
+
2
y = 0, 0 x (E.1)
with its two boundary conditions,
y(0) = 0, (E.2)
y() = 0.
Hint: y
1
(x) = cos x and y
2
(x) = sin x are the two linearly independent
solutions of (E.1).
2. (a) Write the Taylor series expansion of f(t) = sin t, 0 t 2, about
x
0
= 0.
(b) Consider the S
2
(t) partial sum of the Taylor series in part (a) for approx-
imating f(t) = sin t. Follow what was done in equation (2.6) to show
the added renement to the S
2
(t) approximation due to involving the next
seven terms in S
9
(t). Graph your results for the comparison. (Note that
both S
2
(t) and S
9
(t) are still not good approximations of f(t) = sin t, while
S
9
(t) S
2
(t) is only an added renement to S
2
(t) that does not improve
the approximation by much, which means that we need to have a very large
N in S
N
(t), such as N = 40.
3. Consider the function f(t) = cos t, 0 < t < 2. Follow Example 2.1 (and Fig-
ure 2.3) to show the approximation of this function with the dyadic discretization
3
The Scaling Recurrence
Relation
3.1 INTRODUCTION
In the second chapter (Section 2.3.1) we mentioned that in wavelet analysis, usually, a
single scaling function series is used to yield an approximated" or blurred" version
of the given signal. Another (double) series of the associated wavelets is added to
the former to bring about a renement. The result is a satisfactory approximation (or
representation) of the signal.
The question now is how are these scaling functions found. Once they are, it is a
simple computation to construct their associated basic wavelets. The scaling functions
or building blocks" are of paramount importance in our study of wavelet analysis in
this book. They are the solutions of the following scaling recurrence relation (or simply
the scaling equation) in the scaling function (x):
(x) =

k
p
k
(2x k) (3.1)
or
(x) =

k
h
k

2(2x k). (3.2)


The second version of the scaling equation in (3.2) with coefcients h
k

2 replac-
ing p
k
in (3.1) is used to emphasize the normalization of

2(2x k) such that

2(2xk)|
2
dx = 1 in order to have the normalization at the different energies
of the signal, which will be explained later. When the computations become long, we
may choose the equivalent form in (3.1) with {p
k
} as the scaling coefcients to avoid
carrying the coefcient of

2 in (3.2). As the name implies, we note in (3.2) that the


scaling function (x) is scaled as (2x) and translated by
k
2
, for integer k, as the nal
45
46 Chapter 3 THE SCALING RECURRENCE RELATION
(2x k) = (2(x
k
2
)) inside the series (3.2).
We have mentioned that the above scaling equation is with coefcients h
k
(or p
k
)
to be determined at the outset. This turns out not to be a simple matter! Indeed, we
will depend on Multiresolution Analysis (MRA) in Chapter 8 to derive a number of
properties for these coefcients. Then, with guidance from some basic theorems and
the Fourier transform, we will be able to determine these coefcients for a number of
very applicable scaling functions. Fortunately, the coefcients needed for constructing
the associated basic wavelets are related in a very simple way to the above scaling
coefcients h
k
of (3.2).
In our discussion and illustration of a number of well-known scaling functions and
their associated basic wavelets, we will assume, for now, that we know the scaling
coefcients.
With the involvement of scaling and translation in the scaling equation (3.2), we
begin this chapter with a simple review of scaling and translating functions. This is
followed by the iterative method of successive approximations for solving the scaling
equation, assuming knowledge of the coefcients. We end the chapter by presenting a
number of well-known basic wavelets.
3.2 SCALING AND TRANSLATION: HALLMARKS OF WAVELET ANALYSIS
Here we will review and illustrate the concept of translating a function, followed by
scaling its domain, and nally the two operations together.
Translation
To translate a function f(x) to the right of the origin by x
0
= a, we choose a new
coordinate x

measured from a new y-axis erected at x


0
= a, as shown in Figure
3.1(a), x

= x a. So, in the new coordinates (x

, y), the translated y = f(x) to


the right is y = f(x

) = f(x a), a > 0, as shown in Figure 3.1(b). The same


is done for the translation f(x + a) of f(x) to the left of the origin by x
0
= a as
y = f(x(a)) = f(x+a). This can also be seen by erecting a newy-axis at x
0
= a,
with new coordinate x

= x (a) = x + a, where we have y = f(x

) = f(x + a)
as shown in Figure 3.1(c).
54 Chapter 3 THE SCALING RECURRENCE RELATION
3.5 ITERATIVE METHODS FOR SOLVING THE SCALING EQUATION
We will present here two different iterative methods for solving the scaling equation
(3.2). The rst subject in the next section is the successive approximations iterative
method, which is often seen in solving integral equations, for example. The second (in
Section 3.5.2) starts with initial values of the scaling function at integers, for example,
on the right side of (3.2) to generate values of (x) on the left side at half integers. So,
this method starts with initial values and it parallels what we do in solving differential
equations when given the initial values of the solution.
3.5.1 The iterative method of successive approximations
It may be time to give some idea about the usual (approximate) method of solving the
above scaling equation (3.2),
(x) =

k
h
k

2(2x k).
It is an iterative method of successive approximations. This approach starts by
assigning a zeroth approximation
0
(x) for (x) inside the sum of (3.2) as an input,
and looks at the result of the sum as a rst approximation
1
(x) as an output,

1
(x) =

k
h
k

2
0
(2x k). (3.9)
The
1
(x) is then used again as an input inside the sum of (3.9) to result in
2
(x) as an
output,

2
(x) =

k
h
k

2
1
(2x k). (3.10)
This iterative process continues to
n
(x) as input and
n+1
(x) as output
1
,

n+1
(x) =

k
h
k

2
n
(2x k), (3.11)
until in practical situations the difference between successive iterations becomes negli-
gible, that is |
n+1
(x)
n
(x)| 0. In Chapter 10 we will present Theorem 10.4 in
Section 10.3 with conditions that guarantee the convergence of the presented iterative
method of successive approximations. In Example 3.8(a) we will use this iterative
method to generate a Daubechies scaling function (as shown in Figure 3.14).
In the following example we will illustrate this iterative method for the Haar scaling
function with its scaling coefcients h
0
= h
1
=
1

2
.
1
This notation of the approximation
n
(x) of the scaling function is not to be confused with the well-established
notation
n
(x) of the B-splines, which we will present
3
(x) as a special case in Equation (3.22).
3.5 ITERATIVE METHODS FOR SOLVING THE SCALING EQUATION 55
Example 3.1 The Iterative Method of Successive Approximations: Computing for the
Haar Scaling Function.
In this case, the scaling equation (3.2)(with k=0,1) becomes
(x) = (2x) +(2x 1), (E.1)
where h
0
= h
1
=
1

2
. Another constraint is given by the normalization,

(x)dx =
1.
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
a)
0
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
b)
1
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
c)
2
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
d)
3
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
e)
4
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
f)
5
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
g)
6
(x)
1 0 1 2 3 4
0
0.2
0.4
0.6
0.8
1
x
h)
7
(x)
Fig. 3.5: The iterative method for the Haar scaling function,
0
(x) to
7
(x).
We begin the iterative method with the zeroth approximation, shown in Figure 3.5(a)

0
(x) =

1
3
, 0 x 3
0, otherwise,
(E.2)
56 Chapter 3 THE SCALING RECURRENCE RELATION
which satises

3
0

0
(x)dx = 1. From (3.11), using again h
0
= h
1
=
1

2
, the rst
approximation is

1
(x) =
0
(2x) +
0
(2(x
1
2
)) (E.3)
that is shown in Figure 3.5(b). Then we use this
1
(x) for the input in (3.11) to have
the second approximation
2
(x),

2
(x) =
1
(2x) +
1
(2(x
1
2
)), (E.4)
as shown in Figure 3.5(c). This process is repeated, where the graph of
n+1
(x) in
(3.11) with n = 4 in Figure 3.5(f) comes close, then
7
(x) in Figure 3.5(h) comes
closer to the exact Haar scaling function in Figure 1.2 of Chapter 1. We leave it as
an exercise to use the successive approximations iterative method to solve for the roof
scaling function,
(x) =

x, 0 x < 1
2 x, 1 x < 2
0, otherwise,
(3.12)
as shown in Figure 3.8, with its three non-zero scaling coefcients h
0
=
1
2

2
, h
1
=
1

2
,
and h
2
=
1
2

2
. As we did for the Haar scaling function, it is advisable to start with a
zeroth approximation,

0
(x) =

1
2
, 0 x < 2
0, otherwise,
with its integral over (, ) being one (see Exercise 4).
With the very simple form of the Haar and the roof scaling functions we will attempt,
in Examples 3.3 and 3.4, a graphical method to verify that they are solutions to the
scaling equation, given their respective scaling coefcients. For such simple forms of
scaling functions, we will also illustrate the method graphically by using these scaling
functions for constructing their corresponding wavelets.(as seen in Examples 3.6 and
3.7)
We will leave the use of such an approach for solving the scaling equation to Exam-
ple 3.8(a) in the case of the
D2
(x) Daubechies 2 scaling function as shown in Figure
3.14. Then, we will use such results in Example 3.8(b) to construct the corresponding

D2
(x), the Daubechies 2 basic wavelet, as shown in Figure 3.15.
Before we do this, it is instructive to give a simple example of how such a successive
approximations iterative method is used for solving other types of equations, in this
case solving a simple integral equation.
70 Chapter 3 THE SCALING RECURRENCE RELATION
It should be easy to showthat this basic roof wavelet satises the necessary condition
of all admissible basic wavelets, i.e., their average over the real line must vanish. This
is done by showing that the integral

(x)dx = 0, which is the case when we use


the (x) in (3.26). It also shows from the graph of (x) in Figure 3.13 where the area
under (x) is zero.
Example 3.8(a) The Iterative Method for Constructing The Daubechies 2 Scaling
Function
D2
This example is for the Daubechies 2 scaling function
D2
(x) with its four coefcients
h
0
=
1+

3
4

2
, h
1
=
3+

3
4

2
, h
2
=
3

3
4

2
, and h
3
=
1

3
4

2
. The scaling equation (3.2)
gives,
(x) =
3

k=0
h
k

2(2x k)
= h
0

2(2x) +h
1

2(2x 1) +h
2

2(2x 2) +h
3

2(2x 3)
=
1 +

3
4

2(2x) +
3 +

3
4

2(2x 1) +
3

3
4

2(2x 2)
+
1

3
4

2(2x 3). (3.27)


We have already seen the very simple forms of the Haar and roof scaling functions,
which we were able to use in even a graphical way to verify their respective scaling
equations. The present Daubechies 2 scaling function, unfortunately, does not have
an analytical expression by any measure for us to attempt to verify its above scaling
equation (3.27). The only method remaining at our disposal is the iterative one of (3.11),
which we must carry out numerically. Here, we may start with the zeroth approximation
as a small constant,
0
(x) = c, but we need some idea about its compact support. So,
we assume (depending on prior knowledge) that the
0
(x) has a compact support of
(0,3) in order to start the iterative process with, for example,

0
(x) =

1
3
, 0 < x < 3
0, otherwise.
Note that our rst guess satises

0
(x)dx =

3
0
1
3
dx = 1. We use this
0
(x)
inside the following sum of the scaling equation,

n+1
(x) =
3

k=0
h
k

2
n
(2x k), (3.11)
to nd
1
(x). Then we continue the process to nd
2
(x),
3
(x), . . . ,
10
(x) as shown
in Figure 3.14.
3.5 ITERATIVE METHODS FOR SOLVING THE SCALING EQUATION 71
0 1 2 3
0.5
0
0.5
1
1.5
x
a)
0
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
b)
1
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
c)
2
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
d)
3
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
e)
4
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
f)
5
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
g)
6
(x)
0 1 2 3
0.5
0
0.5
1
1.5
x
h)
7
(x)
Fig. 3.14: The result of seven iterations,
0
(x),
1
(x), . . . ,
7
(x), of the scaling equation associated
with the
D2
scaling function.
Example 3.8(b) Constructing The Daubechies 2 Wavelet
D2
As it is for any basic wavelet, once its associated scaling function is found (in the
present case numerically) it is easy (again, numerically in the present case) to nd the
basic wavelet via equation (3.23),
(x) =

k
(1)
k
h
1k

2(2(x
k
2
)). (3.23)
72 Chapter 3 THE SCALING RECURRENCE RELATION
We determine here the k index values of the sum, given that we have h
0
, h
1
, h
2
, and
h
3
. So, for the above equation h
1k
is non-zero for 1 k = 0, 1, 2, 3, which makes k
of the sum run from 2 to 1,
(x) =
1

k=2
(1)
k
h
1k

2(2(x k/2)),
= (1)
2
h
3

2(2x + 2) + (1)
1
h
2

2(2x + 1)+
+ (1)
0
h
1

2(2x) + (1)
1
h
0

2(2x 1).
In Figure 3.15 we show the Daubechies 2 scaling function
n
(x) and its associated
wavelet (x 1).
1 0 1 2 3 4
1.5
1
0.5
0
0.5
1
1.5
2
t
(t)
(t1)
Fig. 3.15: The Daubechies 2 scaling function
D2
(x) and its associated wavelet
D2
(x), translated by
1 to the right.
In Section 3.6 we will present a scaling function and its associated basic wavelet that
die out to zero as x . Thus, they do not have a compact support because they
do not die out to identically zero beyond a nite interval. These are the Shannon (sinc)
scaling functions and one of its associated basic wavelets.
3.5.2 An Initial Value Problem for Computing the Scaling Functions
Here we present another numerical (iterative) method for computing the scaling func-
tion. It, of course, assumes knowledge of the scaling coefcients for having a denite
scaling equation. In this method, we also assume that we have initial values of the
sought scaling function, for example, (0), (1), (2), and (3) in the case of the
104 Chapter 4 VECTOR SPACES, SUBSPACES AND BASES
orthonormal scaling functions {2
j
2
(2
j
x k)}, is
f
j
(x) =

j,k
2
j
2
(2
j
x k). (4.20)
If we use the orthonormality of the present Haar basis, after multiplying both sides
of the above equation by 2
j
2
(2
j
x l) and integrating, we can obtain the following
expression for the coefcients {
j,k
}:

j,k
=

f
j
(x)2
j
2
(2
j
x k)dx. (4.21)
The derivation goes as follows:

f
j
(x)2
j
2
(2
j
x l)dx =

j,k
2
j
2
(2
j
x k) 2
j
2
(2
j
x l)dx
=

j,k

2
j
2
(2
j
x k) 2
j
2
(2
j
x l)dx
=

j,k

k,l
=
j,l
,

j,l
=

f
j
(x)2
j
2
(2
j
x l)dx,
where the orthonormality of the above Haar scaling functions on (, ) has been
used in the last integral inside the sum.
The same can be done for g
j
(x) W
j
in terms of the orthonormal discrete wavelet
set {2
j
2
(2
j
x k)}
g
j
(x) =

j,k
2
j
2
(2
j
x k), (4.22)

j,k
=

g
j
(x)2
j
2
(2
j
k)dx. (4.23)
We now illustrate these two expansions for the Haar scaling functions and their
associated wavelets with a couple of examples.
Example 4.12 Approximation of Functions by the Haar Scaling Functions Series
For simplicity, we will start the computations by considering expansion in the space V
0
with its Haar scaling functions basis {(x k)} to give a (rough) approximation f
0
(x)
of the function in (E.1) (as shown in Figure 4.2). We will follow this by nding a more
rened f
1
(x) V
1
decomposition of the function using the smaller scale
1
=
1
2
, as
shown in Figure 4.4.
Consider the function
f(x) = x, 0 < x < 3. (E.1)
4.4 BASIS OF A VECTOR SPACE 105
Note that for the Fourier series expansion we had f(x) = x, 0 < x < (or <
x < ). Here, we are limited to 0 < x < 3, where the end points of the interval (0,3)
are integers, because of the translations of {(x k)} by integers. So, to cover (0,3)
we will need k = 0,1, and 2, i.e., (x), (x 1), and (x 2). If, for example, it
is necessary to cover (0, ), we can always go to 2
j
2
(2
j
x k) with a low scale such
that, for example,
6
=
1
2
6
=
1
64
and k = 0 to 201, where we can get close to .
Thus, the scaling function expansion of the function in (E.1) at the scale
0
= 1 is its
approximation (trend),
f
0
(x) =
2

k=0

0,k
(x k) =
0,0
(x) +
0,1
(x 1) +
0,2
(x 2). (E.2)
Now, we compute the coefcients

0,0
=

f(x)(x)dx =

1
0
x 1dx =
x
2
2

1
0
=
1
2
(E.3)
where we used the fact that (x) has a compact support [0,1), which meant that
(x) = 1, 0 x < 1, and zero otherwise.

0,1
=

x(x 1)dx =

2
1
xdx =
x
2
2

2
1
=
4
2

1
2
=
3
2
(E.4)
where we know that the translated (x 1) = 1, 1 x < 2, and zero otherwise,

0,2
=

x(x 2)dx =

3
2
xdx =
x
2
2

3
2
=
9
2

4
2
=
5
2
. (E.5)
With these coefcients, we have the approximate expansion of f(x) = x, 0 < x < 3,
as
x f
0
(x) =
2

k=0

0,k
(x k), 0 < x < 3
=
1
2
(x) +
3
2
(x 1) +
5
2
(x 2)
=
1
2

1, 0 x < 1
0, otherwise
+
3
2

1, 1 x < 2
0, otherwise
+
5
2

1, 2 x < 3
0, otherwise
, (E.6)
as shown in Figure 4.2, which shows that f(x) = x, 0 x < 3 is approximated by the
ladder-like function f
0
(x). Of course, as we decrease the scale from 1 to
1
2
, then
1
4
,
and so on, we would see a more rened (zigzagging) approximation of the continuous
function f(x) = x, 0 x < 3. The discontinuties in the approximation f
0
(x) of
the given function are due to the nature of the Haar scaling function, with its clear
106 Chapter 4 VECTOR SPACES, SUBSPACES AND BASES
discontinuities at x = 0, 1 with a jump size of 1. Moving to high order Daubechies
scaling functions, with their continuity, as we will later illustrate numerically, we will
not have this discontinuous approximation any more, especially if we also go to a much
lower scale than the
0
= 1 used above.
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
x
f(x) = x
f
0
(x)
Fig. 4.2: The Haar scaling function series approximation (average or trend) f
0
(x) of f(x) = x,
0 x < 3 at the scale
0
= 1.
To make a comparison with the Fourier series expansion, we showin Figure 4.3 S
1
(x),
S
2
(x), and S
3
(x) of S
n
(x) the nth partial sum approximation of f(x) = x, 0 < x < .
They are very smooth as they inherit the smoothness of the harmonic functions (as the
basis used). However, they are a bit far from the given function, if, for the moment,
we visualize the (zigzag) approximation of the Haar scaling function series with scale

6
=
1
2
6
=
1
64
, instead of the shown
0
= 1.
0 0.5 1 1.5 2 2.5 3 3.5
0
0.5
1
1.5
2
2.5
3
3.5
x
f(x) = x
f
0
(x)
S
1
(x)
S
2
(x)
S
3
(x)
Fig. 4.3: The Truncated Fourier series approximations S
1
(x), S
2
(x), S
3
(x) and the scaling function
series approximation f
0
(x) of f(x), 0 x < 3.
166 Chapter 6 THE FOURIER TRANSFORM
We will show that
F{p
a
(x)} =
2 sin a

. (E.2)
In fact,
F{p
a
(x)} = F() =

e
ix
p
a
(x)dx =

a
a
e
ix
1dx
=

1
i
e
ix

a
x=a
=
1
i
[e
ia
e
ia
]
=
2 sin a

. (E.3)
The functions f(x) = p
a
(x) and F() =
2 sin a

are shown in Figure 6.1 with a = 3.


5 4 3 2 1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
x
0
2
4
6
2 0 2

Fig. 6.1: (a) f(x) = p


3
(x), (b) F() =
2 sin 3

.
The gate function p
a
(x) is taken in electrical engineering as the system function
of the ideal low pass lter. In other words, it only allows frequencies x between a
to a to pass through. The Fourier transform F() =
2 sin a

of p
a
(x), which is usually
written as a function of time h(t) = F(t) =
2 sin at
t
is called the impulse response" of
the ideal low pass lter.
6.1.4 The Uncertainty Principle
In physics the variable x may be taken as the position of a particle in one dimension,
while in the -Fourier transform space, (or p) represents the momentum of that par-
ticle. In Figure 6.1 let us concentrate on the interval 3 < x < 3: p
3
(x) may be
interpreted as telling that the probability of nding the quantum particle in the interval
(3, 3) is 100%. The important question is that we do not know exactly (and we
6.1 THE FOURIER EXPONENTIAL TRANSFORM 167
never will, due to the uncertainty principle!) what the position of the particle is in that
interval of (3, 3). In the momentum -space we see from Figure 6.1(b) that there is a
high probability that the momentum of this particle is in the neighborhood of the origin,
and it is decreasing as we move away from there. Indeed, the gure shows another
important aspect, that is the momentum can take any value (between 6 and 1.30),
where
2 sin 3

is present everywhere on the real line of the -momentum space. This is


in contrast to the p
a
(x), which is zero beyond (3, 3). This also says that our assurance
of having the position of the particle in (3, 3) implies that we cannot determine its
momentum as being within a nite interval in the -momentum space. As shown in
Figure 6.1(b), the momentum can have any value (between 6 and 1.30); the only
thing we know is that it has high probability of being close to the origin = 0.
So our certainty of knowing that the quantum particle is in a denite position interval
(a, a) resulted in a greater uncertainty on the value of the momentum of the same
particle. As we change a, we may see a trade off in the momentum uncertainty. By
increasing a we are lowering the certainty of the x-position of the particle. However,
looking at F() =
2 sin a

, we see that it is narrowing toward the origin, which increases


the certainty for the momentumvalue in that neighborhood. This may be stated as a high
uncertainty x about the position of the quantum, which implies less uncertainty p
of its momentum. Putting this simply, in the language of the Heisenberg uncertainty
principle of quantum physics, we have x p = , where is a physical constant
termed the Planck constant.
0 1/8 1/4 1/2 1
0
1/2
1
2
4
8
t

Fig. 6.2: The time and frequency resolutions according to the uncertainty principle t =
1
4
.
In signal analysis, analogously, this is stated as t = c for the frequency and
the time t. So, whether we do Fourier analysis or wavelet analysis for the signal, we
remain bound to this uncertainty condition. In other words if, for example, we want
to be more denite about a low frequency in p
a
(x) of Figure 6.1, i.e., we desire high
resolution there, then we make a small. But this in turn will make the pulse of
2 sin ta
t
168 Chapter 6 THE FOURIER TRANSFORM
in the time space much wider, which means that we are less denite in the time space.
This also implies that a high resolution at low frequencies (for a < < a with small
a,) results in low resolution in the time space. This is illustrated schematically in Figure
6.2 based on, for example, t =
1
4
. There we see the high resolution of
1
4
for the
lowest part of the frequency with a relatively bad resolution of 1 in time. For the higher
frequencies we see a low resolution of 1 there but a high resolution of
1
4
in time.
Exercises 6.1
1. Consider the gate function
f(t) =

1, a < t < a
0, otherwise.
Use the Fourier transform in (6.4) to nd F(), then substitute the latter in (6.3).
Showthat this is equivalent to beginning with the other Fourier transformin (6.8),
then using the inverse in (6.7).
2. (a) Show whether or not the Fourier transform of
f(t) = e
|t|
, < t <
exists, and whether it is continuous.
Hint: Consult Theorem 6.1.
(b) Assume that you have F() in part (a); can you guarantee the existence of
its inverse?
Hint: Consult Theorem 6.2.
(c) Repeat the analysis in parts(a) and (b) for the (causal) function
f(t) =

e
t
, 0 < t <
0, t < 0.
3. Find the Fourier transform of f(t) = e
t
2
, < t < , using the result

e
y
2
dy =

. (E.1)
Hint: Write the Fourier integral, complete the square for t
2
it in the
exponent, and then use the result in (E.1).
6.2 BASIC PROPERTIES OF THE FOURIER TRANSFORM AND THE
CONVOLUTION THEOREM
In this section we will establish the most basic properties of the complex exponential
Fourier transform.
6.6 THE DISCRETE AND FAST FOURIER TRANSFORMS 201
0 1 2 3 4 5
0
100
200
300
frequency [Hz]
|F()|
Fig. 6.9: The frequency content of the stationary wave in (E.1), f(t) = sin t +
1
12
sin 2t +
1
4
sin 3t.
(N = 512, t = 0.0667 s.)
In wavelet analysis we also take advantage of the dyadic scale. This is in the sense
that we also have a partition when it comes to the scaling functions and wavelet series
implementation via parallel lowand high pass lters. There, we partition in the rst par-
allel pair of lters, then we follow the output of the rst low pass lter to partition again
for its output through another pair of low and high pass lters. The process continues
until we end with one element (coefcient) output for each of the last two parallel lters.
Non-stationary Waves
From our discussion in the last Chapter of the Fourier series, and the present Fourier
transform of a signal f(t), we know that the Fourier coefcients in the Fourier series
(5.1)-(5.4) and the Fourier transform in (6.1) measure the frequency content in the sig-
nal. In the following example we illustrate what we mentioned before, that the Fourier
analysis in general is incompatible with non-stationary waves.
We will perform FFT computations to show that the Fourier transform does not dis-
tinguish between stationary and non-stationary waves with the same frequency content.
Example 6.7 Drawback of the Fourier Transform for Non-stationary Waves
Here, we present an example of a stationary wave, which is a superposition of three
sine waves at three different frequencies of 1, 2, and 5 cycles per second (hertz)
f(t) = sin 2t + sin 4t sin 10t (E.1)
as shown in Figure 6.10(a).
202 Chapter 6 THE FOURIER TRANSFORM
0 2 4 6 8 10
2
1
0
1
2
time [s]
f(t)
Fig. 6.10(a): The stationary wave of (E.1).
If we Fourier transformthis combined signal, we expect three spikes in the frequency
space at
1
= 1,
2
= 2, and
3
= 5 Hz with a very high resolution. This is shown
in Figure 6.10(b), which is computed by using the FFT with 1024 points and a time
resolution of t = 1/100.
0 5 10 15 20
0
200
400
600
frequency [Hz]
|F()|
Fig. 6.10(b): The frequency content of the stationary wave of (E.1). (N = 1024,
t = 0.0098 s.)
The next signal, on the other hand, is a superposition of three signals with the same
frequencies as in the above stationary wave of (E.1). However, here each wave, with
its particular frequency, is dened on a different nite subinterval of time. We arrange
these three signals as follows:
f(t) =

sin 2t, 0 t < 4.0039


sin 4t, 4.0039 t < 6.5137
sin 10t, t 6.5137
(E.2)
as shown in Figure 6.11(a).
6.6 THE DISCRETE AND FAST FOURIER TRANSFORMS 203
0 2 4 6 8 10
1
0.5
0
0.5
1
time [s]
f(t)
Fig. 6.11(a): The non-stationary wave of (E.2).
Since the Fourier transform gives the frequency content in a signal, we expect for the
superposition to have three spikes at the frequencies
1
= 1,
2
= 2, and
3
= 5 Hz
in the Fourier frequency space. This is shown in Figure 6.11(b) using the FFT, which
is computed with the same parameters (number of samples and time resolution) as for
(E.1).
0 5 10 15 20
0
100
200
300
frequency [Hz]
|F()|
Fig. 6.11(b): The frequency content of the non-stationary wave of (E.2). Note the not
so good resolution. (N = 1024, t = 0.0098 s.)
Now, consider the following non-stationary signal,
f(t) = sin 12t
2
. (E.3)
Figures 6.12(a) and 6.12(b) show this wave and the amplitude of its Fourier transform,
respectively. In Figure 6.12(b) we observe a continuum of frequency predominating
fromaround 0 to 24 hertz. However, as expected, we have no idea at what time intervals
these frequencies appear! There, the role of wavelet analysis is evident.
204 Chapter 6 THE FOURIER TRANSFORM
0 0.5 1 1.5 2
1
0.5
0
0.5
1
time [s]
f(t)
Fig. 6.12(a): The nonstationary wave of (E.3).
0 20 40 60 80 100
0
50
100
150
200
frequency [Hz]
|F()|
Fig. 6.12(b): The amplitude of the Fourier transform (spectrum) of f(t) in (E.3).
(N = 2048, t = 9.7610
4
s.)
Exercises 6.6
1. For the Fourier series of a periodic function G(f), < f < with period
1
T
,
the following superposition of all of its translations by
n
T
, n integer,

G(f) =

n=
G

f +
n
T

(E.1)
coincides with its periodic extension. When using the discrete Fourier transform
for computing a Fourier series, we may use the wrong period, for example,
1
T

<
1
T
. In this case we have an error that is called the aliasing error."
(a) Consider the Fourier series of
G(f) = e
|f|
,
1
2T
< f <
1
2T
(E.2)
7
The Windowed Fourier and The
Continuous Wavelet
Transforms
7.1 INTRODUCTION
We have already mentioned that the Fourier transform gives us the signal as a function
of frequency only in the Fourier transform space. This means that we can have a
good time resolution by choosing a small time increment t, with corresponding poor
frequency information. The opposite happens in the Fourier frequency -space.
The windowed Fourier transform introduces a time dependence, passing from F()
to F(, t

). It does so by looking at the signal through a window centered around the


time instant t

, to have the frequency information for that time period. Then the window
is moved to cover the signal. The width of this window t is, of course, governed by
the uncertainty principle, where t = c, as seen in detail in Example 6.2. Hence,
this introduces some resolution in time at the expense of the resolution in the frequency.
This is illustrated in Figures 7.1(a) and 7.1(b) for a narrow and wide window, respec-
tively.
0 0.5 1 1.5 2
0
0.5
1
1.5
2
t

(a)
0 0.5 1 1.5 2
0
0.5
1
1.5
2
t

(b)
Fig. 7.1: (a) Narrow time window, (b) Wide time window.
217
218 Chapter 7 THE WINDOWED FOURIER AND THE CONTINUOUS WAVELET TRANSFORMS
The difculty with the windowed Fourier transformis that the windowis xed, which
makes computations very lengthy. The wavelet transform was developed to overcome
this difculty. Briey, its discrete version works with windows of size decreasing in a
dyadic way with scale or resolution
j
=
1
2
j
, j = 0, 1, 2, 3, . . ..
The above represents the main drawback of the Fourier transform; it gives us either
the time variation f(t) of the signal or the -frequency variation of its Fourier transform
F() = F {f(t)}, but not both. However, there are many situations where there is a
need to know at what time certain frequencies appear. The clear example is that of the
music played by an orchestra, where the notes (frequencies) are known at the time they
are played, or to be played. Ideally, such a phenomenon requires a representation for
the sound signal as a function of two variables, time and frequency, i.e., as a surface
S : z = f(t, ). (7.1)
There is also an analogy in quantum physics, where we use the Fourier transforms,
(p) = F {(x)} of the wave function (x), where p is the momentum and x the
position:
(p) =
_

e
ipx
(x) dx. (7.2)
We use here (x) because of its common utilization in quantum physics for the wave
function, but it is not to be confused with (x) of the basic wavelet in this and all
other books on wavelets. From quantum physics we learn that we cannot locate the
quantum at the position x and with precise momentum p, which is the statement of the
uncertainty" principle. What this principle says is that there is always an uncertainty
x for the position x as well as p in the momentum p, such that xp = , where
is the Planck constant. This means that if we want high accuracy or resolution in the
position x of the quantum, i.e., very small x, then the uncertainty in the momentum
p must be large according to xp = . The derivation of the uncertainty principle
uses tools of the Fourier transform. For our signal f(t) and its Fourier transform F(),
we have a parallel uncertainty principle as t = c. Hence, with our use of the
Fourier analysis, we cannot expect exact information about the frequency at an exact
time. This means that we must accept the compromise of very high resolution in time
at the expense of low resolution in frequency, and vice versa.
In contrast to the Fourier analysis, the discrete wavelet analysis, with its dyadic scale
in both time and frequency and Multiresolution Analysis, provides an answer to the
above drawback of Fourier analysis. To a good extent, it allows us to look at a signal as
a function of time as well as frequency, with due observance to the above uncertainty
principle for the compromise between the time and the frequency resolutions.
For discrete wavelets such a compromise may be depicted as in Figure 7.2, where we
use t =
1
4
for the sake of illustration. We see here that for the small time scale of
1
8
(on the horizontal t-axis) we must use the large frequency scale of 2 (on the vertical
-axis) to make t =
1
8
(2) =
1
4
. On the other hand, for a larger time scale of
1
4
,
7.2 THE WINDOWED (SHORT TIME) FOURIER TRANSFORM (STFT) 219
we can lower the frequency scale to 1, and so on. Thus, as we move in the horizontal
direction (on the t-axis) with a larger time scale, we obtain a better resolution in
the frequency (at its lower range), and when we move in the vertical direction (on the
- axis) with the larger frequency scale, we obtain a better resolution in time. To
summarize, for the small scale in time we must use the large scale in frequency, as
seen in the rst top vertical rectangle of Figure 7.2 for small t. As we increase t,
its resolution becomes worse, while the frequency resolution improves, as seen in the
lowest horizontal rectangle in Figure 7.2 for small .
0 1/8 1/4 1/2 1
0
1/2
1
2
4
8
t

Fig. 7.2: The time and frequency resolution according to t =


1
4
. (The area of each rectangle is
1
4
.)
This is the situation with the discovery of wavelets. However, we should go back
some time before such a discovery to discuss the ideas of attempt toward information
about the frequency of the signal as time changes. This is the development of the
Windowed (or short) Time Fourier Transform (STFT) that we will discuss next very
briey.
7.2 THE WINDOWED (SHORT TIME) FOURIER TRANSFORM (STFT)
This attempt started with a modication" of the Fourier transform. In brief, the kernel
e
it
of the Fourier transform
F() =
_

e
it
f(t) dt (7.3)
is multiplied by a function W(t t

), called window," which is centered at the time


instant t

where we are seeking the frequency information in the signal.


This function (window) W(tt

) is chosen as a smooth function very close to having


a compact support centered at t

. The result is what is called the Windowed (or Short


230 Chapter 7 THE WINDOWED FOURIER AND THE CONTINUOUS WAVELET TRANSFORMS
The function g
j
(t) is the projection of f(t) onto the subspace W
j
, having the discrete
wavelets
_
2
j
2
(2
j
t k)
_
as its basis. Of course, there remains the complement to
such a renement, which is the scaling function series of the blurred" approximation
f
j
(t) of f(t),
f
j
(t) =

j,k
2
j
2
(2
j
t k) (7.34)

j,k
=
_

f(t)2
j
2
(2
j
t k) dt. (7.35)
This f
j
(t) is the projection of f(t) onto the subspace V
j
of the above scaling functions,
which is the orthogonal complement of the subspace W
j
of their associated wavelets in
V
j+1
, that is V
j+1
= V
j
W
j
.
We emphasize that we are doing the comparison of the double discrete wavelet series
in (7.30) with its continuous wavelet transform. If we are left with this, then the
discrete scaling functions series is lacking. Thus, we may say that such a series is the
creation of Multiresolution Analysis as we move fromthe continuous wavelet transform
representation of the signal to the (nite limits) discrete wavelet-scaling function series.
7.3.2 The Discrete Wavelet (Double) Series - For Curbing the Redundancy of
the Continuous Wavelet Transform (CWT)
In the last section we presented the continuous wavelet transform (W

f)(a, b) in (7.9)
with its inverse f(t) in (7.10) as a double integral over the two (continuous) variables
for the scale a and translation b, for representing a signal f(t), which is square integrable
on (, ). There, we used the continuous wavelet,

a,b
(t)
1
|a|
1
2
(
t b
a
) (7.36)
with its two continuous variables a and b, where
1
a
represents a dilation parameter (for
a > 1) and compression (for a < 1) and b represents the translation parameter.
Here, as it was presented in this and previous chapters, we again write the discrete
wavelet as

j,k
(t) 2
j
2
(2
j
t k) = 2
j
2
(2
j
(t 2
j
k)), (7.37)
which formally" for now is the discrete version of the continuous wavelet
a,b
(t) =
|a|

1
2
(
tb
a
), evaluated (for its variables a and b) at the (discrete) dyadic position
b
j,k
= 2
j
k with binary scale a
j
= 2
j
. The scaling factor
1

a
j
= 2
j
2
in (7.33)
represents a contraction (approximately" high frequency for j > 0), and dilation (ap-
proximately" lowfrequency for j < 0). We mention that a
j
= 2
j
is also used, as done in
some references. This makes no difference, since in the discrete wavelet representation
(7.30) we sum over j = to .
7.3 THE CONTINUOUS (CWT) AND DISCRETE WAVELET TRANSFORM (DWT) 231
As we had it in(7.30)-(7.32), the associated
j,k
- wavelet transform is the following
discrete version of the continuous wavelet transform in (7.9) for the above discrete
values of its two parameters a and b, namely, a
j
= 2
j
and b
j,k
= 2
j
k,
(W

f)(2
j
, k2
j
) W

f(2
j
, k2
j
) c
j,k
2
j
2
_

f(t)(2
j
t k) dt. (7.32)
The inverse of this W

f(2
j
, k2
j
), if it exists, would be the following double
series expansion, of the square integrable signal f(t) on (, ), in terms of the
above discrete wavelets
f(t) =

j=

k=
(W

f)(2
j
, k2
j
)
_
2
j
2
(2
j
t k)
_
, < t < . (7.30)
A close look at the possible derivation of (7.37), in comparison to what we do for the
Fourier series expansion, would suggest that the derivation becomes easier if we limit
ourselves to the set of discrete wavelets that are orthonormal on (, ), i.e.,
_

j,k
(t)
m,n
(t) dt =
_

2
j
2
(2
j
t k) 2
m
2
(2
m
t n) dt (7.38)
=
jm

kn
where k, m, j, and n are integers and
km
is the Kronecker delta symbol

km

_
1, k = m
0, k = m.
(7.39)
The question that presents itself now, in light of our discussion in this chapter, is how
does the double wavelet series in (7.30) make a representation of the function f(t)?
This question is valid since in Chapter 4 we always spoke about the role of the wavelet
series in offering only a renement" to be added to the approximate (blurred) version
of the signal via the scaling function series of (7.34)-(7.35).
The catch here, as compared to the innite series (j = to ) in the wavelet series
(7.30), is that, as we have already indicated earlier, we stop at an acceptable scale level
j = J, that is, we use a nite sum. There, for example, we had to appeal to the nested
decomposition V
J+1
= V
J
W
J
, V
J+1
= V
J
W
J
W
0
W
J1
W
J
,
where for every high J with scale
J
= 2
J
, the approximation f
J
V
J
of (7.34)
is a very blurred version of the signal f(t) for such a large scale. Hence, f
J
V
J
can be considered a mere constant, especially if we let J . Thus, we can see the
double wavelet series (7.30), formally (since, in practice, we do not need to go to such
a large
J
= 2
J
), as a representation of the signal f(t), < t < plus the
(added) constant represented by f
J
(t) of the scaling function series at the very high
scale
J
= 2
J
.
232 Chapter 7 THE WINDOWED FOURIER AND THE CONTINUOUS WAVELET TRANSFORMS
As we briey mentioned in Chapter 4 and will discuss in detail in Chapter 8, this can
be put in the language of implementing the discrete wavelet-scaling function decompo-
sition of the signal in terms of successive high pass lters corresponding to {g
j
(t)}
J
J
of the wavelet series in W
J
W
J+1
W
J
and the (one) scaling function series
of f
J
(t) V
J
.
In terms of the frequency spectrumof the signal, such high pass lters of the wavelets
use almost all of the spectrum corresponding to (2
J
, 2
J
), except for the very small
interval (0, 2
J
) that is covered by the low pass lter corresponding to f
J
V
J
.
The latter lls the small gap (0, 2
J
), which is called the plug." So, theoretically, the
spectrum of the 2J + 1 high pass lters and the (only) one low pass lter add up to the
full spectrum of the signal (2
J
, 2
J
). However, there is of course an overlap due to the
individual spectra not being ideal, i.e., they do not vanish abruptly to identically equal
to zero.
7.3.3 Frames - Before the Orthonormal Wavelets: Towards Discretizing the
Continuous Wavelet Transform
Attention must be paid to the resolution of the identity in (7.20) for representing a
continuous signal as the inverse of the continuous wavelet (double integral) transform
in (7.9). Also, we recognize here the power of the Fourier transform analysis in reach-
ing such an important result. This is in the sense that the wavelet transform in (7.9)
gives us a complete characterization of the signal f(t) as in (7.20). However, there is
a serious problem regarding the numerical computations of the single integral in (7.9)
and, especially, the double integral in (7.10).
There will be a question when we come to discretize the wavelet
1

|a|
(
tb
a
) as

j,k
(t) =
1

a
j
(
1
a
j
(t b
j,k
)) with a
j
= a
j
0
and b
j,k
= ka
j
0
b
0
, a
0
> 1, b
0
> 0, then try
to use the set {
j,k
(t)} of these discrete wavelets with appropriate coefcients to char-
acterize or represent a signal. Here we are away from the Fourier series analysis. So,
such a wavelet series must stand on its own to answer the important question, whether
it completely characterizes the signal. In addition, this series may be required to satisfy
a condition similar to the resolution of the identity of the continuous wavelet transform.
So, the test for such a series must be that it allows a good (stable) numerical scheme,
and a simple algorithm for computing the coefcients of the sought series. A very
simple way of dening a stable system, is that a bounded (or small) input to the system
should produce a correspondingly bounded (small) output. In other words, the output
depends continuously on the input. In our brief reference in Sections 3.3, 4.3, and 4.4
to the scaling function and wavelet series, these questions were not raised. The reason,
as we will see shortly, is that we considered only the special case of the orthonormal
wavelets {2
j
2
(2
j
(t
k
2
j
))} and their associated scaling functions {2
j
2
(2
j
(t
k
2
j
))},
with their very special dyadic scaling a
j
=
1
2
j
and translation b
j,k
=
k
2
j
.
8.2 MULTIRESOLUTION ANALYSIS (MRA) 241
W{f} = (W

)f(a, b) =
1
_
|a|
_

f(t)(
t b
a
)dt. (8.3)
Here we note the dependence of the transform (W

)f(a, b) on the two continuous


parameters a and b.
As expected, the inverse (continuous) wavelet transform involves integration over
the two parameters a and b. Thus, we expect a double integral; as a result, we must
worry about the redundancy and long computations involving such an integral. This
is especially apparent when we allow (a,b) R
2
, i.e., when they cover the entire plane.
The inverse (continuous) (
tb
a
)-wavelet transform is
f(t) = W
1
{(W

)f(a, b)}
=
1
C

dadb
a
2
{(W

f)(a, b)
1
_
|a|
(
t b
a
)}, (7.10,8.4)
where C

is a nite number depending on the wavelet used, and is dened as


C

=
_

|()|
2
||
d < , (7.11,8.5)
where () is the Fourier transform of the wavelet (t).
In equation (8.5) we see the importance of the required admissibility condition"
(0) = 0, since it is needed in the integrand of it to ensure the convergence of the
integral, as was shown in section 7.3 following equation(7.21).
As we will see in the next sections, the shift from the continuous wavelet transform
to the discrete wavelet series was done to efciently remedy the redundancy of the
(double integral) inverse continuous wavelet transform. This departure to a great extent
is manifested in the following Multiresolution Analysis.
8.2 MULTIRESOLUTION ANALYSIS (MRA)
Nowwe present a rather formal denition of the MRA, which is the most usual (typical)
one involving orthonormal basis.
Denition 8.2 Multiresolution Analysis
A Multiresolution Analysis rst requires the existence of a nested sequence
... V
2
V
1
V
0
V
1
V
2
... (8.6)
of subspaces {V
j
}

j=
such that the following conditions are met:
1. Density: Their union

jZ
V
j
is dense in the real square integrable functions
f(x) L
2
(, ),
242 Chapter 8 MULTIRESOLUTION ANALYSIS AND FILTER BANKS
_
jZ
V
j
= L
2
(, ), (8.7)
which again means that any square integrable function can be approximated as
closely as desired by a sequence of members of the union of these vector spaces.
For example, this is the case for a double series of scaling functions summed over
the scale levels and the translations by integers.
2. Separation: Their intersection is the zero set,

jZ
V
j
= {0}. (8.8)
3. Scaling: f(x) V
j
if and only if f(
x
2
j
) V
0
(or f(x) V
0
if and only if
f(2
j
x) V
j
). This means that by scaling up (or down) by the dyadic scale

j
=
1
2
j
we can move from one of the nested spaces to the other.
4. Orthonormal basis: The set of the scaling functions {(x k)}
kZ
is an or-
thonormal basis, i.e.,
_

(x k)(x k

)dx =
_
0, k = k

1, k = k

k, k

Z. (8.9)
We note that in the space V
j
we have the orthonormal set of scaling functions
{2
j
2
(2
j
x k)}. We must note, as we had illustrated in Chapter 3, that this fourth
requirement (of the usual MRA) is not satised by all applicable scaling functions. For
example, it is satised by the Daubechies ones, whose (simplest) special case is the
Haar scaling function. However, it is not satised by the hat (roof) scaling function,
(x) =
_
_
_
x, 0 x 1
2 x, 1 x 2
0, otherwise
(8.10)
where, for example, (x) and (x 1) are not orthogonal on (, ). This is seen
in Figure 8.1, where the product (x)(x 1) on (1, 2) is positive. Hence,
_

(x)(x 1)dx =
_
2
1
(x)(x 1)dx = 0. (8.11)
1 0 1 2 3 4
0
0.5
1
x


(x)
(x1)
Fig. 8.1: The roof scaling function (x) and its translation (x 1) as an example of a non-orthogonal
scaling function with respect to translation by integers.
8.2 MULTIRESOLUTION ANALYSIS (MRA) 243
As we mentioned in Chapter 3, this hat scaling function is a special case of a more
general class of functions, namely, the B-splines. For such scaling functions that do not
satisfy the above fourth condition of the MRA denition, there is a theory with weaker
conditions. For more on this, we refer to the authoritative reference of Daubechies.
There is also a method of orthogonalizing the B-splines that yields the Battle-Lemarie
scaling functions, as we will discuss in Section 11.3.2.
With the Daubechies scaling functions we can show that the above four requirements
of the MRA are met. In particular, starting from their simplest one, the Haar scaling
function, we nd that conditions 1 and 2 are clear when we examine the limits of
j Z. Condition 3 is intuitive when we look at the scaling by the dyadic scale
j
=
1
2
j
.
Condition 4 is easily satised, since in this special case of the Haar scaling functions
there is no overlap between their compact supports, as their translations by the integers
k and k

differ by integers. Thus, their product is zero, and the integral in (8.9) vanishes.
8.2.1 Establishing The Scaling Equation
The rst result we can easily obtain from the MRA denition is the establishment of
the scaling equation,
(x) =

k
h
k

2(2x k). (8.12)


We are after only the form in the above equation; the coefcients {h
k
} are to be
found later from their properties that are derived using the MRA, a number of basic
theorems, and the use of the Fourier transform. This will be the subject of Chapter 10.
Since {(x k)} are an orthonormal basis of V
0
, we can express (x) V
0
in a
series of them,
(x) =

k
a
k
(x k). (8.13)
However, this is not the scaling equation, since the latter involves (2x k) inside the
sum and not (x k). Now, the existence of nested subspaces, demanded by the rst
condition of the MRA, in particular V
0
V
1
with {2
1
2
(2x k)} as the orthonormal
basis of V
1
, becomes important. Thus, since (x) V
0
V
1
, then (x) V
1
, and we
can write its series expansion in terms of the basis {2
1
2
(2x k)} of V
1
, which yields
(x) =

k
h
k

2(2x k) (8.12)
as the sought scaling equation.
We will return to the MRA in Sections 9.2 and 9.3 for a helping hand in developing
the properties of the scaling coefcients {h
k
}. (We have spelled out four of these in
equations (3.18)-(3.21) that will be used in conjunction with the Fourier transform,
which will prove to be very instrumental in determining the coefcients.)
262 Chapter 8 MULTIRESOLUTION ANALYSIS AND FILTER BANKS

1,0
,
1,1

1,0
,
1,1

0,0

0,0

0,0

0,0

1,0
,
1,1

1,0
,
1,1
Fig. 8.10: Decomposition and reconstruction of f
1
(x) V
1
- A special case of a quadrature mirror
lter pairs.
In Figure 8.10 we show the decomposition as well as the construction of f
1
(x) in V
1
.
Such a gure depicts the rst stage of the two parallel low and high pass lters at the
scale
1
=
1
2
, expressed as V
1
= V
0
W
0
as can be seen clearly in Figure 8.13(for 16
inputed samples). This means that the details in V
1
are formed via the projections of the
signal onto the wavelet subspace W
0
, with scales of 1. The role of the scaling functions,
on the other hand, is only to supply a coarse (blurred) projection of the signal onto the
subspace V
0
, with its large scale of 1 in the above case, resulting in an approximation
of the signal itself.
Thus, for the general case when a signal is suspected to need a very ne scale, for
example,
5
=
1
32
, then we begin with f
5
(x) V
5
, and the above decomposition goes as
V
5
= V
0
W
0
W
1
W
2
W
3
W
4
, which requires ve lowand high pass lter pairs.
In the next section we will elaborate more on the above implementation of the signal
decomposition and reconstruction, by dealing more with lter banks, with illustrations
in terms of the lters frequency bands.
8.5.2 Schematic Formulation of the Filter Banks Process
The decomposition in Figure 8.8 of the signal f(t) V
1
= V
0
W
0
may be further
illustrated with the actual frequencies that the low and high pass lters allow to pass
through them. In Figure 8.11 we designate the action of the low and high pass lter
by the absolute value of their respective system functions |H()| and |G()|. Let
H() and G() be the Fourier transforms of the impulse responses h(t) and g(t) of
the low and high pass lters, respectively. Here is the frequency measured in radians
per second. For an ideal low pass lter, |H()| is represented by a constant value
for (a, a), and zero otherwise. Here we take a =

2
for Figure 8.11(a). Thus,
the low pass lter allows only low frequencies limited to (

2
,

2
). On the other
hand(for the total interval of (, )), the high pass lter of Figure 8.11(b) allows the
high frequencies (,

2
) (

2
, ). In our illustration we will consider a half
low pass lter with (0,

2
), and a half high pass lter with (

2
, ). Hence, the
8.5 A BRIEF LOOK AT IMPLEMENTING SIGNAL DECOMPOSITION WITH FILTERS 263
combination of the above low and high pass lters in parallel will allow all frequencies
, 0 < < , to pass.
0
1
/2 0 /2

(a) |H()|
0
1
/2 0 /2

(b) |G()|
Fig. 8.11: (a) A low pass lter, (b) A high pass lter.
The use of such lters is an implementation of scaling functions and wavelets compu-
tations for the signal samples (or coefcients). For example, considering V
1
= V
0
W
0
,
in V
1
we have a rened version of the signal at the scale
1
=
1
2
, which means that there
we have all frequencies 0 < < . The decomposition V
0
W
0
, on the other hand,
corresponds to the combination of the low frequencies 0 < <

2
, allowed by the
low pass lter for an approximation of the signal, and the remaining high frequencies

2
< < allowed by the high pass lter for the added renement. This decompo-
sition is illustrated in Figure 8.12 for 16 samples input to the initial (single) low pass
lter at the scale
1
=
1
2
.
V1, 1 = 1/2
V0, 0 = 1 W0, 0 = 1
low-pass
0 < <
high-pass

2
< <
low-pass
0 < <

2
down sampling down sampling
16 samples
16 coefcients 16 coefcients
16 coefcients 16 coefcients
8 coefcients 8 coefcients
Fig. 8.12: A simple low pass lter outputing 16 coefcients to a parallel low and high pass lters,
representing the decomposition V
1
= V
0
W
0
.
264 Chapter 8 MULTIRESOLUTION ANALYSIS AND FILTER BANKS
In the usual computations the rened (high frequency) output coefcients of the high
pass lter are down sampled, then stored, resulting in the eight coefcients of W
0
, in
this case. The sixteen coefcients output of the low pass lter are also down sampled
to eight, representing the coefcients of V
0
. These latter coefcients output of the low
pass lter are inputed again to another parallel pair of low and high pass lters, with
half the previous frequency bands, i.e., 0 < <

4
and

4
< <

2
, respectively. This
corresponds to the next (larger) scale following
0
= 1, that is
1
= 2 as shown in
Figure 8.13.
V1, 1 = 1/2
V0, 0 = 1
V1, 1 = 2
V2, 2 = 4
V3, 3 = 8
W0, 0 = 1
W1, 1 = 2
W2, 2 = 4
W3, 3 = 8
low-pass
0 < <
high-pass

2
< <
low-pass
0 < <

2
high-pass

4
< <

2
low-pass
0 < <

4
high-pass

8
< <

4
low-pass
0 < <

8
high-pass

16
< <

8
low-pass
0 < <

16
16 samples
16 coefcients 16 coefcients
16 coefcients 16 coefcients
8 coefcients 8 coefcients
8 coefcients 8 coefcients
8 coefcients 8 coefcients
4 coefcients 4 coefcients
4 coefcients 4 coefcients 4 coefcients 4 coefcients
4 coefcients 4 coefcients
2 coefcients 2 coefcients
2 coefcients 2 coefcients
2 coefcients 2 coefcients
1 coefcient 1 coefcient
Fig. 8.13: Four parallel low and high pass lters constituting lter banks for the decomposition V
1
=
V
3
W
3
W
2
W
1
W
0
.
9.2 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT) AND ITS INVERSE (IFDWT) 289
0
1
|L()|
3 2 0 2 3

0
1
|H()|
3 2 0 2 3

Fig. 9.1: Low and high pass lters system functions.


On the band (2, 2) the two lters complement each other, since together they pass
all frequencies in that band. Let us concentrate on the right side of this band, i.e, (0, 2).
The low pass lter is concerned with the low frequencies (large scale) on (0, ), while
the high pass lter passes the higher frequencies (smaller scale-renements) output
on its band (, 2). The output of the low pass lter that we will follow for further
decompositions will be with lower frequency, i.e., larger scale. This means that the
decomposition process moves from high frequencies (smaller scale) to low frequencies
(larger scale) for the outputs of the low pass lters.
9.2 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT) AND ITS
INVERSE (IFDWT)
For our illustration of the Fast Wavelet Transform we concentrate in this Chapter on
using the Daubechies scaling function
D2
, with its four coefcients
h
0
=
1 +

3
4

2
, h
1
=
3 +

3
4

2
, h
2
=
3

3
4

2
, h
3
=
1

3
4

2
and its associated wavelet
D2
, with its coefcients

h
0
= h
3
=
1

3
4

2
,

h
1
= h
2
=
3

3
4

2
,

h
2
= h
1
=
3 +

3
4

2
,

h
3
= h
0
=
1 +

3
4

2
The latter coefcients are used to construct
D2
in terms of its associated scaling
functions
D2
. Note that, in order not to carry the

2 in the denominator of the
expression of the above eight coefcients, a number of authors use
p
0
=

2h
0
=
1 +

3
4
, p
1
=

2h
1
=
3 +

3
4
,
290 Chapter 9 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT)
p
2
=

2h
2
=
3

3
4
, p
3
=

2h
3
=
1

3
4
.
What we have been using as h
0
, h
1
, h
2
, h
3
correspond to the normalized scaling
functions {

2(2t k)} in the scaling equation,


(t) =

k
h
k

2(2t k) (3.2)
while the p
0
, p
1
, p
2
, and p
3
correspond to the use of the non-normalized scaling functions
{(2t k)} in the same scaling equation,
(t) =

k
h
k

2(2t k) =

k
p
k
(2t k). (3.2a)
So, in order not to carry the

2 in the computations for the fast Daubechies wavelet


transform, we will adopt the latter notation of {p
k
}, but for that time we call them
{h
k
}, remembering that the (2t k) used are not normalized. We stay with the use
of {h
k
} because it is customary that these symbols are associated with the structure of
the lters, such as {h
0
, h
1
, h
2
, h
3
} and {h
3
, h
2
, h
1
, h
0
}, for the Daubechies 2 low
and high pass lters, respectively.
Our discussion in this section concentrates on the basic development of the fast
Daubechies wavelet transform and its inverse. The illustrations with numerical compu-
tations are the subject of Section 9.2.2. Our aim is to transform the coefcient output
{a
k,0
} of the rst low pass lter, at the scale
0
= 1, to those {a
k,1
, b
k,1
} of the next
low and high pass lters outputs, at the larger scale
1
= 2. This is done by showing
the decomposition transformation in this example of the scaling function {(t k)} to
{(
t
2
k), (
t
2
k)} as shown in equation (9.24). Then we will associate the scaling
function and wavelet coefcients a
k,0
, a
k,1
, and b
k,1
with (t k), (
t
2
k), and
(
t
2
k), respectively. We add that we have 2
n+1
coefcients {a
k,0
} and the total of
the {a
k,1
, b
k,1
} coefcients, after downsampling, will also be 2
n+1
.
Let us remember that for the averaging process for 2
n
samples in (8.51),
a
k,0
=
2
n
1

i=0
f(i)(i k), (8.51(a))
we needed to extend the samples to have themmatch the translated given values of the
scaling functions, as demonstrated in Examples 8.6 and 8.7. We will assume here a pe-
riodic extension. So, if we start with four samples {f
0
, f
1
, f
2
, f
3
}, for example, we have
a period of 4, so that f
4
= f
0
, f
5
= f
1
, f
6
= f
2
, and f
7
= f
3
. (This is different fromthe
periodic extension variations of Section 8.6, which were done to reduce the edge effect.)
We will see that going from{a
k,1
} to {a
k,0
, b
k,0
} amounts to a simple matrix equation,
whose input is the column of {a
k,1
} and the output is the column with alternate elements
in the sequence {a
k,0
, b
k,0
}, as will be shown in (9.33). Furthermore, the square matrix
9.2 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT) AND ITS INVERSE (IFDWT) 291
of the transformation is somewhat sparse. Also, because of the following properties
of the above four Daubechies scaling coefcients,
h
2
0
+ h
2
1
+ h
2
2
+ h
2
3
= 2, (9.7)
h
2
h
0
+ h
3
h
1
= 0, (9.8)
and the sparsity of , the inverse matrix
1
is related to in a very simple way,
namely, as a constant multiple of its transpose,
1
=
1
2

. This inverse matrix will,


of course, be needed for the reconstruction process of the signal as we return from
{a
k,0
, b
k,0
} to {a
k,1
}.
Note that had we used the coefcients with normalized scaling functions in their
scaling equation, the sum in (9.7) would have become 1 because of the
1

2
in the
denominator of our usual expressions for these coefcients.
To prepare for our illustration we will show next that

= 2I. Our illustration


here will start with an 8 8 decomposition matrix (of the eight scaling coefcients) at
the scale
2
=
1
2
2
=
1
4
, with period 2 2
2
= 8. What remains is to show that

= 2I
in the following.
The Decomposition Matrix and its Inverse
1
for the Reconstruction,
1
=
1
2

We will show soon in(9.34), using equations (9.7)-(9.8), that the transformation matrix
of the scaling functions basis, associated with the scale
2
=
1
4
, to those of the scaling
functions-wavelets, associated with the scale
1
=
1
2
, is
=
_

_
h
0
h
1
h
2
h
3
0 0 0 0
h
3
h
2
h
1
h
0
0 0 0 0
0 0 h
0
h
1
h
2
h
3
0 0
0 0 h
3
h
2
h
1
h
0
0 0
0 0 0 0 h
0
h
1
h
2
h
3
0 0 0 0 h
3
h
2
h
1
h
0
h
2
h
3
0 0 0 0 h
0
h
1
h
1
h
0
0 0 0 0 h
3
h
2
_

_
. (9.9)
This will be used in equation (9.24) for transforming a scaling function sequence of
period 8 at the scale
0
= 1 to those of the scaling function-wavelet sequence at the
292 Chapter 9 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT)
scale
1
= 2, with period 4 for each. The transpose

of this matrix is

=
_

_
h
0
h
3
0 0 0 0 h
2
h
1
h
1
h
2
0 0 0 0 h
3
h
0
h
2
h
1
h
0
h
3
0 0 0 0
h
3
h
0
h
1
h
2
0 0 0 0
0 0 h
2
h
1
h
0
h
3
0 0
0 0 h
3
h
0
h
1
h
2
0 0
0 0 0 0 h
2
h
1
h
0
h
3
0 0 0 0 h
3
h
0
h
1
h
2
_

_
. (9.10)
Now, we will show that
1
=
1
2

=
_

_
h
0
h
1
h
2
h
3
0 0 0 0
h
3
h
2
h
1
h
0
0 0 0 0
0 0 h
0
h
1
h
2
h
3
0 0
0 0 h
3
h
2
h
1
h
0
0 0
0 0 0 0 h
0
h
1
h
2
h
3
0 0 0 0 h
3
h
2
h
1
h
0
h
2
h
3
0 0 0 0 h
0
h
1
h
1
h
0
0 0 0 0 h
3
h
2
_

_
h
0
h
3
0 0 0 0 h
2
h
1
h
1
h
2
0 0 0 0 h
3
h
0
h
2
h
1
h
0
h
3
0 0 0 0
h
3
h
0
h
1
h
2
0 0 0 0
0 0 h
2
h
1
h
0
h
3
0 0
0 0 h
3
h
0
h
1
h
2
0 0
0 0 0 0 h
2
h
1
h
0
h
3
0 0 0 0 h
3
h
0
h
1
h
2
_

_
=
_

_
h
2
0
+ h
2
1
+ h
2
2
+ h
2
3
h
0
h
3
h
1
h
2
+ h
1
h
2
h
0
h
3
h
0
h
3
h
1
h
2
+ h
1
h
2
h
0
h
3
h
2
0
+ h
2
1
+ h
2
2
+ h
2
3
h
0
h
2
+ h
1
h
3
h
0
h
1
h
0
h
1
h
3
h
2
h
3
h
2
h
1
h
3
+ h
0
h
2
0 0

h
0
h
2
+ h
1
h
3
. . . . . .
h
0
h
1
h
0
h
1
. . . . . .
h
2
0
+ h
2
1
+ h
2
2
+ h
2
3
. . . . . .
h
0
h
3
h
1
h
2
+ h
1
h
2
h
0
h
3
. . . . . .
h
0
h
2
+ h
1
h
3
. . . . . .
. . . . . . h
2
0
+ h
2
1
+ h
2
2
+ h
2
3
_

_
9.2 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT) AND ITS INVERSE (IFDWT) 301
at the scale
2
and

d
1
= [a
0,1
, b
0,1
, a
1,1
, b
1,1
, . . . , a
3,1
, b
3,1
]
at the scale
1
, we use the matrix

to go from

d
1
to a
2
= 2
1

d
1
=

d
1
,
a
2
=
_

_
a
0,2
a
1,2
a
2,2
a
3,2
a
4,2
a
5,2
a
6,2
a
7,2
_

_
=

d
1
=
_

_
h
0
h
3
0 0 0 0 h
2
h
1
h
1
h
2
0 0 0 0 h
3
h
0
h
2
h
1
h
0
h
3
0 0 0 0
h
3
h
0
h
1
h
2
0 0 0 0
0 0 h
2
h
1
h
0
h
3
0 0
0 0 h
3
h
0
h
1
h
2
0 0
0 0 0 0 h
2
h
1
h
0
h
3
0 0 0 0 h
3
h
0
h
1
h
2
_

_
_

_
a
0,1
b
0,1
a
1,1
b
1,1
a
2,1
b
2,1
a
3,1
b
3,1
_

_
, (9.42)
In the following example, we will illustrate the decomposition process in part (a),
the approximated function for the different decompositions in part (b), and the recon-
struction of the original function in part (c).
Example 9.1 Decomposing a Signal with Daubechies 2 Filters
(a) The Decomposition (Analysis)
In this example we start by illustrating

d
1
=
1
2
a
2
of Equation (9.34). We are given
the sequence s = {0, 1, 2, 3}. For the necessary extension to period 8 (corresponding
to n = 2 in 8 = 2(2
n
) for the index n in a
n
), we choose to do it with a smooth periodic
extension (with matching slopes at the two ends) to avoid the edge effect [see Nievergelt,
1]. There, for s
5
and s
6
we took the easy way of interpolating with a straight line to
have the extended sequence as {0, 1, 2, 3, 4,
7
3
,
2
3
, 1}. Had we done the interpolation
with splines, we would have obtained the extended sequence as {0, 1, 2, 3, 4, 2, 1, 1}.
This is the one used for this example.It is instead of {0, 1, 2, 3, 2, 1, 0} of the mirror
image extension of Example 8.7 that showed edge effect.
We rst prepare the coefcients of a
2
= [a
0,2
, a
1,2
, a
2,2
, a
3,2
, a
4,2
, a
5,2
, a
6,2
, a
7,2
]

for (9.34). Here we easily compute a


0,2
=
3

3
2
, a
1,2
=
5

3
2
, a
1,2
= a
7,2
=
1

3
2
.
Also a
2,2
=
7

3
2
, a
6,2
= a
2,2
=
1+

3
2
. For example,
a
0,2
= s
0
(0) + s
1
(1) + s
2
(2) + s
3
(3)
302 Chapter 9 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT)
= 0 + 1
1 +

3
2
+ 2
1

3
2
+ 0 =
3

3
2
.
Our illustration is for computing a
3,2
, a
4,2
, and a
5,2
:
a
3,2
= s
3
(0) + s
4
(1) + s
5
(2) + s
6
(3)
= (3)(0) + (4)
1 +

3
2
+ (2)
1

3
2
+ (1)(0) = 3 +

3,
a
4,2
= s
4
(0) + s
5
(1) + s
6
(2) + s
7
(3)
= (4)(0) + (2)
1 +

3
2
+ (1)
1

3
2
+ (1)(0) =
3 +

3
2
,
a
5,2
= s
5
(0) + s
6
(1) + s
7
(2) + s
8
(3)
= (2)(0) + (1)
1 +

3
2
+ (1)
1

3
2
+ 0 =

3.
So,
a
2
=
_
3

3
2
,
5

3
2
,
7

3
2
, 3 +

3,
3+

3
2
,

3,
1

3
2
,
1

3
2
_

.
(E.1)
These are the coefcients for the signal decomposition as f
2
(t) V
2
. We are using the
Daubechies 2 lters of h
0
=
1+

3
4
, h
1
=
3+

3
4
, h
2
=
3

3
4
, and h
3
=
1

3
4
. Now we
will have the decomposition of f
2
(t) as f
2
(t) V
2
= V
1
W
1
in

d
1
=
1
2
a
2
, where
is as in (9.34). Therefore, we have
a
0,1
=
1
2
[h
0
a
0,2
+ h
1
a
1,2
+ h
2
a
2,2
+ h
3
a
3,2
]
=
1 +

3
8
3

3
2
+
3 +

3
8
5

3
2
+
3

3
8
7

3
2
+
1

3
8
(3 +

3)
a
0,1
=
18 5

3
8
,
b
0,1
=
1
2
[h
3
a
0,2
h
2
a
1,2
+ h
1
a
2,2
h
0
a
3,2
]
=
1

3
8
3

3
2

3

3
8
5

3
2
+
3 +

3
8
7

3
2

1 +

3
8
(3 +

3)
b
0,1
=
3
8
,
a
1,1
=
1
2
[h
0
a
2,2
+ h
1
a
3,2
+ h
2
a
4,2
+ h
3
a
5,2
]
=
1 +

3
8
7

3
2
+
3 +

3
8
(3 +

3) +
3

3
8
3 +

3
2
+
1

3
8
(

3)
a
1,1
=
7 + 5

3
4
,
9.2 THE FAST DAUBECHIES WAVELET TRANSFORM (FDWT) AND ITS INVERSE (IFDWT) 303
b
1,1
=
1
2
[h
3
a
2,2
h
2
a
3,2
+ h
1
a
4,2
h
0
a
5,2
]
=
1

3
8
7

3
2

3

3
8
(3 +

3) +
3 +

3
8
3 +

3
2

1 +

3
8
(

3)
b
1,1
=
1

3
4
.
Following the same computations, we nd
a
2,1
=
8 + 3

3
8
, b
2,1
=
1 6

3
8
, a
3,1
= 1

3, b
3,1
= 0.
Hence, we have

d
1
=
_

_
a
0,1
b
0,1
a
1,1
b
1,1
a
2,1
b
2,1
a
3,1
b
3,1
_

_
=
_

_
185

3
8

3
8
7+5

3
4
1

3
4
8+3

3
8
16

3
8
1

3
0
_

_
. (E.2)
From this

d
1
we obtain
a
1
=
_
185

3
8
,
7+5

3
4
,
8+3

3
8
, 1

3
_

(E.3)
and

b
1
=
_

3
8
,
1

3
4
,
16

3
8
, 0
_

. (E.4)
The coefcients a
1
and

b
1
are considered to be the outputs of the rst pair of the
Daubechies 2 scaling function and wavelet lters, respectively. On the other hand, a
2
was the output of the (rst) Daubechies scaling function lter.
Now, we will decompose a
1
using (9.33), moving toward the decomposition in
V
1
= V
0
W
0
,

d
0
=
_

_
a
0,0
b
0,0
a
1,0
b
1,0
_

_
=
1
2
a
1
=
1
2
_

_
h
0
h
1
h
2
h
3
h
3
h
2
h
1
h
0
h
2
h
3
h
0
h
1
h
1
h
0
h
3
h
2
_

_
_

_
185

3
8
7+5

3
4
8+3

3
8
1

3
_

_
, (E.5)
where is as in (9.33). Hence,
a
0,0
=
1 +

3
8
18 5

3
8
+
3 +

3
8
7 + 5

3
4
10.1 BASIC GUIDING THEOREMS 315
10.1.1 The Scaling Coefcients Representation as an Integral
Here, we will use the above orthonormality property of the scaling functions to derive
the form of the scaling coefcients as an integral. This parallels the same method
we employed in deriving the Fourier coefcients {a
0
, a
k
, b
k
}

k=1
of the Fourier series
in Example 2.3, where we used the orthogonality property of the harmonic func-
tions {1, cos kx, sin kx}

k=1
on (, ). We also used the fact that the function
f(x), < x < , being represented by the Fourier series, is square integrable
on (, ). This allowed us to integrate the (innite) series term by term or, in other
words, to exchange the integration operation with the innite summation of the series.
Such an exchange is based on the series converging in the mean to the square integrable
function f(x) L
2
(, ).
In our analysis here, we assume that we are working with square integrable functions
on (, ), i.e., f(x) L
2
(, ).
Consider the scaling equation
(x) =

k
h
k

2(2x k). (10.1)


To nd h
l
, for example, we multiply both sides of this equation by

2(2x l), then


integrate from x = to ,

(x)

2(2x l)dx =

2(2x l)

k
h
k

2(2x k)dx
=

k
h
k

2(2x l)

2(2x k)dx
=

k
h
k

kl
= h
l
after using the orthonormality of the set {

2(2x k)} in the integral inside the sum.


Hence,
h
k
=

(x)

2(2x k)dx. (10.3)


Note that in the case of the Fourier coefcients we integrate the product of the
given function and one of the known harmonic functions. Here, h
k
is expressed as the
integral of the product of the unknown (sought) scaling function (x) and its scaled and
translated version

2(2x k). However, we will nd a good use for this formula of


h
k
.
The Length of the Scaling Coefcients Sequence
Indeed, the above form of h
k
is very helpful in telling us how non-vanishing number of
the scaling coefcients is dependent on the compact support of (x). This is so since
316 Chapter 10 SEARCHING FOR THE SCALING EQUATION COEFFICIENTS
the integrand in (10.3) is (x)

2(2(x
k
2
)), and the second factor, with its scale of
1
2
and translation by
k
2
, will cease to have a non-zero overlap with (x) after
k
2
exceeds the
width of the compact support of (x). For the Haar scaling function, for example, with
its compact support [0, 1) of width one, only (2(x
k
2
)) with k = 0 and k = 1 will
have non-zero overlap with (x), resulting in the possible two non-zero coefcients
h
0
, h
1
.
For the Daubechies 2 scaling function, the support is [0, 3). So, there is no overlap
to possibly constitute non-vanishing coefcients, except for k = 0, 1, 2, 3, 4, 5. Thus,
we expect at most six non-vanishing coefcients, and we know that there are four. The
reason must be that the integral of the overlaps corresponding to the other two of the
translations is zero.
Such a simple explanation for the length of the sequence of the scaling coefcients
{h
n
} can be, with more details and care, developed in the following Theorem 10.2. Let
us call h
n
= h(n) a function of n Z; the theorem will tell us about the strong relation
between the length N of the non-zero members of the sequence {h(n)}
N
n=0
and the
compact support of the associated scaling function.
Theorem 10.2
If the scaling function (x) has compact support on 0 x N1, and if {(xk)}
are linearly independent, then h
n
= h(n) = 0 for n < 0 and n > N 1. Hence, N is
the length of the sequence."
Again, our example of the Daubechies 2 scaling function with compact support [0, 3)
(or [0, 4 1)) is denitely linearly independent (as these scaling functions are orthogo-
nal there). Hence, h(n) = 0 for n < 0 and n > 3, which makes the four non-vanishing
coefcients of
D2
.
This theorem covers more than the orthogonal scaling functions of the Daubechies
type, for example. It is more relaxed as it assumes only linear independence of the
compact supported scaling functions. Hence, it is valid for the roof scaling functions
too, for example, which are linearly independent with compact support [0, 2). So,
N = 3 and we have three non-vanishing scaling coefcients, namely, h
0
=
1
2

2
, h
1
=
1

2
, h
2
=
1
2

2
.
10.1.2 The coefcients for constructing the basic wavelet in terms of its
associated scaling functions
Similar to what we did in nding the above formula (10.3) of integral representation of
the scaling coefcients {h
n
} of the (orthonormal) scaling functions {

2(2x k)},
we can do the same for the coefcients {c
k
} used in constructing the basic wavelet
10.3 EXACT METHOD FOR DETERMINING THE DAUBECHIES SCALING COEFFICIENTS 333
compact support to (0, 2) denies it the orthogonality property on (, ) with respect
to its translations by integers. So, we repeat once more that our attempt in this direction
to seek continuous orthogonal scaling functions has failed.
10.3.1 Determining the Daubechies 2 Scaling Coefcients
We should note at this stage that moving from the Haar scaling function polynomial
P(e
i
2
) = P
1
(z) =
1
2
[1 + e
i
2
] (10.39)
to the new one,
p() = P(e
i
) =
1
2
[1 + e
i
] = e
i
2
.
1
2
[e
i
2
+ e
i
2
] = e
i
2
cos

2
, (10.38)
results in scaling down by a factor of 2 in the -frequency space, since we went from
P(e

i
2
) to P(e
(
i
2
)/
1
2
) = P(e
i
) with a smaller scale of
1
2
. This, according to the
Fourier transform pair, corresponds to scaling up by a factor of 2 in the time space.
Thus, p() corresponds to a Haar scaling function with the larger compact support [0, 2).
Note that this may be in the direction of our development, since according to Theorem
10.1 a larger compact support increases the number of non-vanishing coefcients.
10.3.2 What to do next? The creative step
Now, the other possibility for aiming at a continuous scaling function, possibly differ-
entiable, is to consider the polynomial,
q(z) = p
2
() =
1
4
(1 + e
i
)
2
= e
i
cos
2

2
, z = e
i
. (10.44)
Unfortunately, this polynomial does not satisfy Condition 2 of Theorem 10.4,
|q(z)|
2
+|q(z)|
2
= |e
i
cos
2

2
|
2
+| e
i
sin
2

2
|
2
= cos
4

2
+ sin
4

2
= 1,
after writing
q(z) =
1
4
(1 e
i
)
2
= e
i
sin
2

2
.
This is only true for p() itself in (10.38), as we showed in Example 10.2. So, there is
no point in going further to consider p
n
(), n > 1.
We see from this and the above discussion that the crux of the matter, in part of this
attempt for satisfying Condition 2 of Theorem 10.4, is the simple identity
cos
2

2
+ sin
2

2
= 1,
which will ensure Condition 2. At this stage comes the promised crucial step, which
is to stay with this identity and try higher integer powers of it instead of higher integer
334 Chapter 10 SEARCHING FOR THE SCALING EQUATION COEFFICIENTS
powers of p() in (10.38). For example, we start with cubing both sides of this identity,
and obtaining
[cos
2

2
+ sin
2

2
]
3
= 1
3
= cos
6

2
+ 3 cos
4

2
sin
2

2
+ 3 cos
2

2
sin
4

2
+ sin
6

2
= [cos
6

2
+ 3 cos
4

2
sin
2

2
] + [3 sin
2
(
+
2
) cos
4
(
+
2
) + cos
6
(
+
2
)],
(10.45)
after using cos = sin( +

2
) and sin = cos( +

2
).
The grouping of the two parts in (10.45) is done in preparation of the rst two terms
(in brackets) as a nominee for a polynomial Q() = |p()|
2
, where p() is the sought
polynomial P() to be tested with the conditions of Theorem 10.4,
|p()|
2
= Q() = cos
6

2
+ 3 cos
4

2
sin
2

2
. (10.46)
The second grouping in (10.45) makes the polynomial Q( + ) = |p( + )|
2
,
|p( + )|
2
= Q( + ) = 3 sin
2
(
+
2
) cos
4
(
+
2
) + cos
6
(
+
2
). (10.47)
This ensures that Condition 2 of Theorem 10.4 is satised in advance, since
|p()|
2
+|p( + )|
2
= Q() + Q( + ) = 1. (10.48)
Condition 3 of Theorem 10.4 requires that |p()| > 0 for ||

2
, which is satised
here, as we shall show next. From (10.46) we have
|p()|
2
= cos
4

2
[cos
2

2
+ 3 sin
2

2
], (10.49)
where cos

2

1

2
for ||

2
, and the sum of the two terms in brackets is positive.
Hence, |p()| > 0 for ||

2
. We note that the above choice of p() = P(e
i
) in
(10.38) allows the latter to be P(e
i
), since if we change i to i in Q() = |p()|
2
of (10.46) we obtain the same result. Hence, we can speak of p() as a function of e
i
,
where Condition 3 of Theorem 10.4 is satised for P(e
i
).
What remains is Condition 1 of Theorem10.4, which requires p(0) = 1 for the above
new polynomial p(). However, we do not yet have p(), since in (10.46) we only
dened its absolute value |p()|. We need to nd p() from |p()|, and at the end of
this computation we will show that p(0) = 1.
Note that we can write a complex number in its polar form as
z = x + iy = re
i
, r =

x
2
+ y
2
= |z|.
10.3 EXACT METHOD FOR DETERMINING THE DAUBECHIES SCALING COEFFICIENTS 335
So, in |z| = |re
i
| = r, we lose the phase factor e
i
, which is what we must recover
for p() from having |p()| in (10.46). We will for now allow such a phase factor ()
for p() = |p()|() to be determined in the sequel, in such a way that serves our
purpose for determining the scaling coefcients.
Now, by factorizing the sum in (10.49) and realizing that |x iy| = |x + iy| =

x
2
+ y
2
, we can write | cos

2
i

3 sin

2
| = | cos

2
+ i

3 sin

2
|, so we have
|p()|
2
= cos
4

2
[cos
2

2
+ 3 sin
2

2
]
= cos
4

2
(cos

2
+ i

3 sin

2
)(cos

2
i

3 sin

2
)
= cos
4

2
| cos

2
+ i

3 sin

2
|
2
. (10.50)
By taking the square root of the equation we choose.
p() = cos
2

2
| cos

2
+ i

3 sin

2
|(), (10.51)
where () is the phase factor after writing | cos

2
+ i

3 sin

2
|() = [cos

2
+
i

3 sin

2
].
For the present case of cubing cos
2
2
+ sin
2
2
= 1 in (10.45), in the search for
the scaling coefcients of the Daubechies 2 wavelet, the phase factor () in (10.51)
is chosen to make p() a polynomial P
3
(z) of degree 3. This is to aim at the four
coefcients of the polynomial P
3
(z) in (6.52).
We are after the four coefcients of p() = P
3
(z) =
1
2
[a
0
+ a
1
z + a
2
z
2
+ a
3
z
3
] =
1
2
[h
0
+ h
1
z + h
2
z
2
+ h
3
z
3
] as z = e
i
. So, we write the trigonometric functions in
(10.51) in terms of complex exponentials. After this, the phase factor () is chosen
such that the rst term in our result is of degree zero in z = e
i
,
p() = [
e
i
2
+ e

i
2
2
]
2
[(
e
i
2
+ e

i
2
2
) + i

3(
e
i
2
e

i
2
2i
)]() (10.52)
p() =
1
8
[e
i
+ e
i
+ 2][e
i
2
+ e

i
2
+

3(e
i
2
e

i
2
)]()
=
1
8
[{e
i3
2
+ e
i
2
+

3(e
i3
2
e
i
2
)} +{e

i
2
+ e

3i
2
+

3(e

i
2
e

3i
2
)} + 2{e
i
2
+ e

i
2
+

3(e
i
2
e

i
2
)}]()
=
1
8
[(e
i3
2
+

3e
i3
2
) + (e
i
2
+ 2e
i
2

3e
i
2
+ 2

3e
i
2
)
+ (e

i
2
+

3e

i
2
+ 2e

i
2
2

3e

i
2
) + (e

i3
2

3e

i3
2
)](),
p() =
1
8
[(1 +

3)e
i3
2
+ (3 +

3)e
i
2
+ (3

3)e

i
2
+ (1

3)e

i3
2
]()
(10.53)
336 Chapter 10 SEARCHING FOR THE SCALING EQUATION COEFFICIENTS
after grouping the similar terms involving e
i3w
2
, e
i
2
, e

i
2
, and e

3iw
2
.
Now, to have the rst term be of degree zero in z = e
i
we choose the phase factor
() = e

i3
2
,
p() =
1
8
[(1 +

3) + (3 +

3)e
i
+ (3

3)e
2i
+ (1

3)e
3i
]
=
1
2
[
1 +

3
4
+
3 +

3
4
e
i
+
3

3
4
e
2i
+
1

3
4
e
3i
],
p() = P
3
(z) =
1
2
[
1 +

3
4
+
3 +

3
4
z +
3

3
4
z
2
+
1

3
4
z
3
]. (10.54)
From this result we have Condition 1 of p(0) = P(1) = 1, since z = e
i|
=0
= 1.
The last step is to equate coefcients of same powers of z in the above polynomial and
P
3
(z) in (6.52), after noting going to P(e
i
) = P() in (10.38) instead of P(e

i
2
)
in (10.39), as explained there around (10.39) and (10.38).
P
3
(z) =
1

2
[h
0
+ h
1
z + h
2
z
2
+ h
3
z
3
], z = e
i
to have the Daubechies 2 scaling coefcients
h
0
=
1 +

3
4

2
, h
1
=
3 +

3
4

2
, h
2
=
3

3
4

2
, and h
3
=
1

3
4

2
. (10.55)
10.3.3 Toward Determining the Daubechies N (or
N
) Scaling Coefcients
For the above case of Daubechies N = 2 we raised the identity cos
2
+ sin
2
= 1 to
power 2N1 = 2(2)1 = 3 for the 2N = 2(2) = 4 non-zero coefcients. It is tempt-
ing to generalize the above method for nding the scaling coefcients for Daubechies
3, 4, etc., which happened to be the course to follow, as was done by Daubechies.
But, before we do that in Section 10.5, let us note for the Haar (Daubechies 1) scaling
function we had a polynomial P
1
(z) of degree 1, where the Haar scaling function is
discontinuous. For the above Daubechies 2 we have a polynomial P
3
(z) of degree 3,
and we know that the scaling function
2
(t) is continuous.
We must sense fromthis observation that the higher degree polynomial has something
to do with the quality of the scaling function. Indeed, there is another observation that
such a polynomial P
2N1
(z), resulting from raising cos
2
+ sin
2
= 1 to power
2N 1 for Daubechies N (with 2N non-zero coefcients), factorizes as
P
2N1
(z) = (1 + z)
N
Q
N1
(z), (10.56)
where Q
N1
(z) is another polynomial of degree N 1, and Q
N1
(1) = 0.
It is easy to verify (10.56) for the Haar scaling function, with N = 1, where
P
1
(z) =
1

2
h
0
+
1

2
h
1
z =
1

2

1

2
+
1

2

1

2
z = (1 + z)
1
2
, and Q
N1
(z) is of
11
Other Wavelet Topics
In this chapter we will primarily discuss and illustrate the two-dimensional Haar wavelet
transformand an elementary method for its inverse. Other topics presented include some
other important wavelets and the biorthogonal wavelets.
11.1 THE TWO-DIMENSIONAL HAAR WAVELET TRANSFORM
In this book we do not have the space to present the general two-dimensional wavelet
transforms, which are of utmost importance for analyzing and constructing images.
However, with the simple interpretation of the fast Haar wavelet transform in Example
9.2 of averaging and differencing, with the scaling functions and wavelets lters, re-
spectively, we are encouraged to give some ideas about the rst two basic steps of the
fast two-dimensional Haar wavelet transform. This will enable us to retrace our steps
for a method of nding the inverse of this transform.
We will illustrate the rst major step of averaging and differencing in the x-direction,
followed by the same two operations in the y-direction. The interested reader may also
see [Nievergelt, Y[1]].
As shown in Example 9.2, the Haar wavelet transformdoes simple averaging and dif-
ferencing to each pair of a one-dimensional sequence, for example, S = {s
0
, s
1
, s
2
, s
3
}.
So, now we have an idea about a two-dimensional Haar wavelet transform of a two di-
mensional sequence, such as the four points in space: z
0
= f(0, 0) = 9, z
1
= f(0,
1
2
) =
7, z
2
= f(
1
2
, 0) = 5 and z
3
= f(
1
2
,
1
2
) = 3. Even though this is a 2 2 array, it will
give some feeling about what we may expect in dealing with two-dimensional images.
Here, we can take the values 9, 7, 5, and 3 as a measure of the intensity in the grey scale
349
350 Chapter 11 OTHER WAVELET TOPICS
of an image.
These samples are easily interpreted by using a two-dimensional Haar scaling func-
tion as parallelepiped, for example, which has an area of
1
2

1
2
=
1
4
, when using scale

1
=
1
2
in both the x and y directions. With scale
0
= 1, the double Haar scaling
function becomes

(0)
(0,0)
(x, y) =

1, 0 x < 1 and 0 y < 1


0, otherwise
(11.1)
as shown in Figure 11.1.
y
x
z
1/2
1/2
1/2
1/2
0
1
1
1
(1,1)
Fig. 11.1: The Haar wavelet
(0)
(0,0)
(x, y)(scaling function in two dimensions).
The expression in (11.1) is obtained from the tensor product of our usual scaling
function,
(x) =
(0)
(0,0)
(x) =

1, 0 x < 1
0, otherwise
(11.2)
and a similar one in the y-direction,
(y) =
(0)
(0,0)
(y) =

1, 0 y < 1
0, otherwise
, (11.3)

(0)
(0,0)
(x, y) =
(0)
(0,0)
(x)
(0)
(0,0)
(y) =

1, 0 x < 1 , 0 y < 1
0, otherwise.
(11.4)
11.1 THE TWO-DIMENSIONAL HAAR WAVELET TRANSFORM 351
Even though this is a Haar scaling function in two dimensions, it is called a wavelet as
one of the four possible ones.
We will use the subscript (a, b) in
(j)
(a,b)
(x, y) to indicate the top left corner of its
base support in the x-y plane(with the y-axis being the top in Figure 11.1). The super-
script (j) is used to indicate the level of the scale
j
(in this case j = 0 for
0
= 1) in
both the x and y directions, where we see that the base of the cube is 11 in Figure 11.1.
We may note that
(0)
(0,0)
(x, y) in (11.4) is associated with averaging in the x and
y directions. However, we expect more operations, such as average of differencing,
difference of averaging, and difference of differencing. As it may sound, this would
involve other Haar wavelet actions in the x, y, and both directions (namely, diago-
nal), to which we will refer as
h
(0,0)
(x, y),
v
(0,0)
(x, y), and
d
(0,0)
(x, y), respectively.
The scale used will be spelled out, or we may write
h

(1)
(0,0)
(x, y),
v

(1)
(0,0)
(x, y), and
d

(1)
(0,0)
(x, y) at scale
1
=
1
2
, for example. Here h and v refer to the horizontal and
vertical edges resulting from the differencing caused by the wavelet action along the
perpendicular direction to that particular edge. Such three mixed combinations that
involve wavelets are the reason for using the symbol instead of , as the latter is
reserved for the pure averaging as in (11.4).
We note that we are moving to the scale
1
2
for the four
1
2

1
2
squares of the unit square,
where we will also involve translations by
1
2
. So, before writing the expressions for the
above two-dimensional Haar wavelets, we may recall the two important operations of
scaling and translation. An example of scaling with
1
=
1
2
for the two-dimensional
wavelet is

(1)
(0,0)
(x, y) =
(0)
(0,0)
(2x, 2y) =

1, 0 x <
1
2
, 0 y <
1
2
0, otherwise,
(11.5)
which is located at the top-left corner
1
2

1
2
square in Figure 11.1. Its translation by
1
2
in the y-direction is

(1)
(0,0)
(2x, 2(y
1
2
)) =
(1)
(0,1)
(x, y) =

1, 0 x <
1
2
,
1
2
y < 1
0, otherwise
(11.6)
which is located at the top-right
1
2

1
2
square. We are using (0, 1) in
(1)
(0,1)
(instead of

(1)
(0,
1
2
)
) to symbolically indicate the translation direction and to simplify the notation.
The same thing is done for the other
(1)
(0,0)
functions. Examples are:

(1)
(0,0)
(2(x
1
2
), y) =
(1)
(1,0)
(x, y),

(0)
(0,0)
(2(x
1
2
), 2(y
1
2
)) =
(1)
(1,1)
(x, y). (11.7)
352 Chapter 11 OTHER WAVELET TOPICS
These are located in Figure 11.1 at the bottom-left and bottom-right
1
2

1
2
squares.
Now, we return to the two-dimensional Haar wavelet bases
h

(0)
(0,0)
=
(0)
(0,0)
(x)
(0)
(0,0)
(y) =

1, 0 x < 1, 0 y <
1
2
1, 0 x < 1,
1
2
y < 1
0, otherwise,
(11.8)
as illustrated in Figure 11.2. This can be interpreted as the wavelet operation in the
y-direction, which results in differences, followed by the scaling function operation in
the x-direction, which averages these differences. Here, we are using the scale
1
=
1
2
.
So, from Figures 11.1 and 11.2 we may see that
h

(0)
(0,0)
(x, y) can be written in terms
of the
(1)
basis at this scale as a sum of differences (in the x-direction):
h

(0)
(0,0)
=

(1)
(0,0)

(1)
(0,1)

(1)
(1,0)

(1)
(1,1)

. (11.9)
x
1
y
z
0
1
2
1
1
2
1
1
+1
1
Fig. 11.2:
h

(0)
(0,0)
(x, y) - The two-dimensional horizontal wavelet.
This shows a sum of differences in the y-direction (caused by the wavelet action),
which will detect horizontal and other non-vertical edges, and, hence, the use of h in
h

(0)
(0,0)
. (Sometimes it is called a vertical wavelet because of its differencing in the
y-direction.)
Next, we have
v

(0)
(0,0)
v

(0)
(0,0)
(x, y) =
(0)
(0,0)
(x)
(0)
(0,0)
(y) =

1, 0 x <
1
2
, 0 y < 1
1,
1
2
x < 1, 0 y < 1
0, otherwise.
(11.10)
11.1 THE TWO-DIMENSIONAL HAAR WAVELET TRANSFORM 353
Here, the wavelet differencing action in the x-direction causes the edges to be parallel
to the y-axis, as shown in Figure 11.3. Again, we can see from Figures 11.1 and 11.3
that this
v

(0)
(0,0)
(x, y) can be expressed in terms of the
(1)
basis as a difference (of
sums) in the y-direction,
v

(0)
(0,0)
=

(1)
(0,0)
+
(1)
(0,1)

(1)
(1,0)
+
(1)
(1,1)

, (11.11)
x
y
1
1
1
2
1
2
0
z
1
+1
1
-1
Fig. 11.3:
v

(0)
(0,0)
(x, y) - The two-dimensional vertical wavelet.
or the sum (of differences) in the x-direction,

(1)
(0,0)

(1)
(1,0)

(1)
(0,1)

(1)
(1,1)

. (11.11a)
This can be seen clearly as we look at the back positive half versus the front negative half
in Figure 11.3. The second form in (11.11a) shows the differencing in the x-direction,
which will detect vertical and non-horizontal edges. (Sometimes it is called a horizontal
wavelet because of its differencing in the x-direction)
Next is the diagonal wavelet,
d

(0)
(0,0)
=
(0)
(0,0)
(x)
(0)
(0,0)
(y) =

1, 0 x <
1
2
, 0 y <
1
2
1,
1
2
x < 1, 0 y <
1
2
1,
1
2
x < 1,
1
2
y < 1
1, 0 x <
1
2
,
1
2
y < 1
0, otherwise
(11.12)
shown in Figure 11.4,
378 Chapter 11 OTHER WAVELET TOPICS
Our choice of the simple image of a triangle is to emphasize the main purpose of the
wavelet decomposition of images, since for the actual images it is the edges that show
their sought details.
Example 11.4 Decomposing and reconstructing an image of a triangle by using the
MATLAB Wavelet Toolbox
1. The decomposition process:
The rst level decomposition in reference to Example 11.3, Figures 11.6(a)-(d)
show (in a clockwise direction) the rst level two-dimensional Haar wavelet de-
composition of the boundaries of a right triangle:
Fig. 11.6: (a)-(d):The rst level 2-D Haar wavelet decomposition of a right triangle.
(a) S
1
: The average.
(b) T
h
1
: The vertical wavelet differencing action on the image to showhorizontal
and diagonal edges (i.e., the non-vertical ones).
(c) T
v
1
: The horizontal wavelet differencing action on the image to showvertical
and diagonal edges (i.e., the non-horiztonal edges).
(d) T
d
1
: The diagonal wavelet action on the image to show the (only) diagonal
edge present in the triangle.
Note that a Haar wavelet is used at scale
7
=
1
256
. The second level 2-D Haar
wavelet decomposition of the right triangle is shown in Figure 11.7.
11.1 THE TWO-DIMENSIONAL HAAR WAVELET TRANSFORM 379
Fig. 11.7: The second level 2-D Haar wavelet decomposition of a right triangle.
We still can go after a third level decomposition to extract more details, and so on.
Such higher level decompositions have their advantage of extracting more details
for a more realistic (desired) images such as those of the landmark buildings,
natural scenes, photos, nger prints, etc.
2. The reconstruction process:
What we have illustrated in part 1 was only for the decomposition process. It is
there where many operations, such as those leading to compressing or de-noising
the signal, are done. We showed there the role of the (direct) discrete double
wavelet transformation (dwt). What remains is the reconstruction of the image
after having such above modications of its decomposition. This is done by using
the inverse discrete double wavelet transforms (idwt). In Figure 11.8 we show
the second level decomposition, as well as the reconstruction of the image of the
triangle.
Fig. 11.8: The second level decomposition and the reconstruction of the triangle via the Haar dwt and
the idwt, respectively.
11.3 THE SIGNAL ENERGY IN THE WAVELET TRANSFORM 397
11.3 THE SIGNAL ENERGY IN THE WAVELET TRANSFORM
Here we plan to illustrate the compression of a signal, then discuss and illustrate its
denoising process. However, a subject, which is very important to the success of these
processes, is that the energy left in the compressed or denoised signal should not be
reduced by much. Thus we will discuss this subject rst.
Consider the even number of signal samples {f(t
n
)}
N
n=1
. We know that the rst
level of the wavelet decomposition of this signal divides it into two subsignals half
its original length. The rst half corresponds to the scaling function representation,
and we will denote it by a
1
= {a
1
, a
2
, . . . , aN
2
}. The second part is related to the
associated wavelet d
1
= {b
1
, b
2
, . . . , bN
2
}. In Chapter 9 we interlaced the a
i
s and b
i
s,
i = 1, 2, . . . ,
N
2
in equation (9.24). This notation is not the only accepted one, since,
for example, in the Matlab Wavelet Toolbox they use a
9
and d
9
instead of our notation
here of a
1
and d
1
, and so on. As we had explained earlier, the level d
1
carries the
top half of the frequency of the signal, while a
1
carries the lower half of the frequency
band. So, we expect a
1
with its low frequencies, and its elements as the coefcients of
the scaling function series, to give more of an average or trend of the signal. On the
other hand, d
1
with its higher frequencies in the wavelet series would show differences
or uctuations as details of the signal.
We want to remark here that in terms of the energy of the signal the trend in a
1
, in
general, would carry most of the energy compared to that of d
1
, as we will explain
next. For the signal f(t), t
1
t t
2
we dene its energy as:
E
f
=

t
2
t
1
|f(t)|
2
dt. (11.46)
For the discrete signal {f(t
n
)}
N
n=1
we dene its energy as a corresponding sum to that
of the continuous signal f(t) in (11.46),
E
N
=
N

n=1
|f(t
n
)|
2
. (11.47)
Hence, the energy of the averages in a
1
is
E
a
1
=
N
2

n=1
|a
n
|
2
, (11.48)
while that of the uctuations (differences) in d
1
is
E
d
1
=
N
2

n=1
|b
n
|
2
. (11.49)
We are after illustrating two facts: 1) the conservation of energy in the scaling
function - wavelet decomposition of the signal; 2) that for a relatively regular signal,
398 Chapter 11 OTHER WAVELET TOPICS
the energy in the scaling function coefcients amounts to a high percentage of the total
energy.
The second part is intuitive, since the coefcients are in general averages, which are
larger than the differences of d
1
for a reasonably regular signal. This result is vital to
the process of compressing a signal (or an image), where the high frequency details
are with smaller amplitudes, and they can be thresholded without the average (or trend)
signal in a
1
losing much of the energy of the signal.
The rst fact of the energy conservation after the wavelet decomposition reminds us
of the very well known result in Fourier series, i.e., the Parseval equality. For example,
in the case of the complex exponential series of f(t), < t < , and coefcients
{c
n
}

n=
, we have

|f(t)|
2
dt =

n=
|c
n
|
2
. (11.50)
Example 11.9 The Energy in the Haar Decomposition of a Discrete Signal
As an example, we consider the four samples {3, 2, 5, 1}. Their rst level decomposition
in V
2
= V
1
W
1
using the Haar transform is a transformation of the signal to the
decomposition {a
1
|d
1
} = {a
1,1
, a
2,1
|b
1,1
, b
2,1
}, where a
1,1
, a
2,1
are averages,
a
1,1
=
3 + 2
2

2 =
5
2

2, a
2,1
=
5 + 1
2

2 = 3

2, a
1
= {
5
2

2, 3

2}.
Here, compared to example 9.2, we multiplied by 2
j
2
=

2, the normalization factor


in 2
1
2
(2t k) at the scale
j
=
1
=
1
2
. This, as explained earlier, is to normalize the
energy of the scaling functions at the different scales
j
, to the same value of 1.
For d
1
= {b
1,1
, b
2,1
} we have {b
1,1
, b
2,1
} as differences,
b
1,1
=
3 2
2

2 =

2
2
, b
2,1
=
5 1
2

2 = 2

2.
Thus, d
1
= {

2
2
, 2

2} and {a
1
, d
1
} = {
5
2

2, 3

2|

2
2
, 2

2}. Now the energy


of a
1
is (
5
2

2)
2
+ (3

2)
2
=
25
2
+ 18 =
25+36
2
=
61
2
. The energy in d
1
is
(

2
2
)
2
+ (2

2)
2
=
1
2
+ 8 =
17
2
. So, for this example, the energy in the uctua-
tions coefcients is
17
61+17
=
17
78
22% of the total energy of the signal.
Had we had the more smooth sequence of samples such as {4, 6, 10, 12}, this would
have been decomposed as {a
1
|d
1
} = {5

2, 11

2|

2,

2} where the trend


energy is (5

2)
2
+ (11

2)
2
= 50 + 242 = 292, while that of the uctuations is
(

2)
2
+(

2)
2
= 2+2 = 4. So, the energy in the uctuations makes
4
292+4
=
4
296
,
about 1.4% of the total energy of the signal.
If we take an example with a less smooth sequence, such as that of {3, 2, 5, 3}
of Example 9.2, we nd {a
1
|d
1
} = {
5
2

2,

2|
1
2

2, 4

2} with the a
1
energy as
11.4 SIGNAL COMPRESSION AND DENOISING 403
11.4 SIGNAL COMPRESSION AND DENOISING
11.4.1 Compressing a Signal
One of the most important applications of wavelet analysis is in signal compression.
When we use the wavelet decomposition, we have a good number of coefcients to
transmit or store. However, as we had shown in the last section at the end of Example
11.9 and in Example 11.9, in general, many of the coefcients in the uctuation {d
i
}
(or the wavelet coefcients part of the transformed signal) are very small compared to
those of the trend {a
i
} (or the scaling function part). Provided that the latter carries
a high percentage of the total energy of the signal, the smallest coefcients in {d
i
}
may be assigned zero values. This is done by assigning an amplitude threshold under
which all the {d
i
} coefcients are considered zero. A simple way of determining
such a threshold is illustrated in Problem 3 in the exercises, where an energy prole is
computed for the purpose of deciding such a threshold.
4
6
8
10
12
(a)
0 1 2 3 4 5 6 7 8
4
6
8
10
12
(b)
Fig. 11.23: (a) The signal; (b) The compressed signal.
To illustrate the compression of a signal with a very simple example, we consider
the sequence {4, 6, 10, 12, 8, 6, 5, 5} (of Problem 1 in Exercises 11.3) with its rst level
Haar transform as {a
1
, d
1
} = {5

2, 11

2, 7

2, 5

2|

2,

2,

2, 0}. We see
here that the d
1
coefcients are much smaller than those of a
1
. So, if we set the
threshold at |

2|, we have {5

2, 11

2, 7

2, 5

2|0, 0, 0, 0}. Now we invert this


sequence, as done in Problem 1(b) of Section 11.3, to have the compressed signal
{
5

2+0

2
,
5

2+0

2
,
11

2+0
2
,
11

20

2
,
7

2+0

2
,
7

20

2
,
5

2+0
2
,
5

20
2
} = {5, 5, 11, 11, 7, 7, 5, 5}. This compressed signal S
c
and the
original signal S are shown in Figure 11.23.
418 Chapter 12 WAVELET APPLICATIONS
12.1 ONE-DIMENSIONAL WAVELET APPLICATIONS
In this section we will illustrate the use of wavelets for detecting hidden discontinuities.
In that regard, we will show the importance of the vanishing higher moments of the
wavelets for this application.
12.1.1 Detecting a Hidden Jump Discontinuity of a Second Derivative in a
Signal
Even though a signal may look continuous or smooth to the naked eye, it may have
hidden singularities, which can be detected using the wavelet decomposition. The fol-
lowing example illustrates how the wavelet decomposition detects hidden singularities,
for example, a jump in the second derivative of a signal. Consider the function,
g(t) =
_
t, 0 t <
1
2
t 1,
1
2
t < 1
(12.1)
with its clear jump discontinuity at t =
1
2
, as shown in Figure 12.1.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.6
0.4
0.2
0
0.2
0.4
0.6
t
Fig. 12.1: The function g(t) in (12.1) with a jump discontinuity at t = 0.5.
We will integrate it twice to hide this jump discontinuity, where the result f(t) in
(12.3) from g(t) =
d
2
f
dt
2
will look smooth to the eye. Then, we will use a wavelet
decomposition of the function f(t) to illustrate the wavelets detection of the jump
discontinuity in its second derivative
d
2
f
dt
2
= g(t) at t =
1
2
. Of course, this will depend
on the resolution we use. In this case a scale
7
=
1
2
7
will do, but
10
=
1
2
10
is much
better.
h(t) =
_
g(t)dt =
_
_
_
t
2
2
+ c
1
, 0 t <
1
2
t
2
2
t + c
2
,
1
2
t < 1
12.1 ONE-DIMENSIONAL WAVELET APPLICATIONS 419
and with the boundary conditions h(0) = h(1) = 0, we have
h(t) =
_
_
_
t
2
2
, 0 t <
1
2
t
2
2
t +
1
2
,
1
2
t < 1.
(12.2)
Here, we notice the disappearance of the discontinuity of g(t), since h(
1
2
+) =
1
8

1
2
+
1
2
=
1
8
= h(
1
2
). We can still see the discontinuity of the derivative in the cusp at
t =
1
2
, as shown in Figure 12.2.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
t
Fig. 12.2: The function h(t) in (12.2) as the integral of g(t) in (12.1), with a clear jump discontinuity in
its rst derivative at t = 0.5.
To have a function that looks really smooth to the eye, we integrate h(t) to have
f(t) =
_
_
_
t
3
6
+ c
1
, 0 t <
1
2
t
3
6

t
2
2
+
1
2
t + c
2
,
1
2
t < 1.
If we match these two branches of f(t) at t =
1
2
, we have
f(
1
2
+) = f(
1
2
) =
1
48
+ c
1
=
1
48

1
8
+
1
4
+ c
2
,
c
1
= c
2
+
1
8
If we let c
1
= 0, c
2
=
1
8
,
f(t) =
_
_
_
t
3
6
, 0 t <
1
2
t
3
6

t
2
2
+
1
2
t
1
8
,
1
2
t < 1.
(12.3)
This function is shown in Figure 12.3.
420 Chapter 12 WAVELET APPLICATIONS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.01
0.02
0.03
0.04
t
Fig. 12.3: The smooth-looking function f(t) in (12.3) - for wavelets to detect the jump discontinuity
in its second derivative. This function and its continuous rst and second derivatives were generated by
Wayne Galli and Ole Nielson, Mix and Olejniczak [4, p. 231], with due compliments.
We will soon show in Section 12.1.2, using the MATLAB Wavelet Toolbox, that
with Daubechies 3 wavelets at a scale
9
=
1
512
this discontinuity is detected as two
very sharp spikes close together at t = 0.5, and with a very good resolution as shown
in Figure 12.5. Using Daubechies 2 with the same resolution shows the discontinuity
as in Figure 12.4, but not as clear as it is with Daubechies 3.
The function f(t) in equation (12.3) is a simple representative of many signals where
there are hidden discontinuities, spikes, or what we may in general call activity. Such
an activity attracts the attention of many in the applied elds for the possibility of
carrying more interesting information. Of course, in the present example we know that
the signal has a jump discontinuity in its second derivative at t =
1
2
, though in general
we do not know such an exact location, since the signal is often mixed with noise,
etc. However, if we look at the Fourier transform of the signal, we can tell about the
frequency distribution with possible spikes that represent the interesting activity. But,
again, with the Fourier analysis we cannot tell at what time such a spike occur. This,
as we had pointed out from the start of the book, is the advantage of wavelet analysis
compared to the Fourier one. Here, we can tell about the location in time with an
accuracy that depends on the uncertainty principle. In short, and as we had illustrated
in Chapter 6, the uncertainty principle says that if we want to be accurate or use high
resolution to locate the spike in time, then we have to allow uncertainty for the location
of the spike in the frequency space. In other words, a very narrow spike in time will
correspond to a wide distribution of it in the frequency space.
For our example, we may want to locate the hidden jump discontinuity around t =
1
2
with small uncertainty in time, such as t =
1
128
,
1
256
or
1
512
. This means that we
try our wavelet series computations at the scales
7
=
1
2
7
=
1
128
,
8
=
1
2
8
=
1
256
, or

9
=
1
2
9
=
1
512
.
12.1 ONE-DIMENSIONAL WAVELET APPLICATIONS 421
The Role of the Vanishing Higher Moments of the Wavelet
For this particular example we may search with the scale
1
128
only in the interval
(0.4,0.6), but in general we search in the whole domain (0,1) of the signal when we
do not have a hint about the approximate location of the discontinuity. The important
question now is what wavelet to use. We will, as we have done up until now in this
book, limit our search to the Daubechies wavelets
DN
.
We have already discussed in Section 10.4.2 the importance of the wavelet vanishing
higher order moments in its suitability for detecting higher order derivatives of the
signal. In our example here we want a wavelet series that can detect a second derivative
of the signal. In summary, a Daubechies wavelet
DN
(t)-series does not see derivatives
of orders up to N 1 of the signal. We illustrated this in Example 10.4 for
D2
(with
N = 2), showing that its moments M
0
= M
1
= 0. This illustration can be extended
easily, for which we can use the same method, basically looking at the higher derivatives
of the Fourier transform at = 0, to show that M
0
= M
1
= = M
N1
= 0 for
Daubechies
DN
(t) wavelet. For simplicity, we use the
N
and
N
for
DN
and
DN
,
respectively.
This says that the
2
(t)-series will see the second derivative of the signal, which is
where the discontinuity is as shown in Figure 12.4. If we are to be more certain of
seeing signicant activity around t =
1
2
, we may use
3
(t)-series, which will detect
the third derivative in our signal and which comes as a very narrow spike, as shown in
Figure 12.5 for using
3
, and which will be computed in the next Section 12.1.2.
Once we obtain this activity around t =
1
2
, we may decrease the scale to
1
256
or
1
512
for more precise location of the jump discontinuity. To better illustrate this concept,
we start with
2
(t) and scale
1
128
, where we will not be surprised with the activity
around t =
1
2
since
2
(t) sees the second derivative, which is the derivative of the
signal in Figure 12.2 that gives the well-dened left- and right-sided second derivative
at t = 0.5
1
128
and t = 0.5 +
1
128
, respectively. However, using
3
(t) with
7
=
1
128
should show more signicant activity there as seen in Figure 12.5, and if this is not
satisfactory, we use
4
(t). Of course, and as we mentioned in Section 10.4.2, here we
only need the wavelet series (without its associated scaling function series) since we
are looking for details in the signal, and not the full picture of it.
Recalling the Decomposition Implemementation via Filter Banks
In the implementation of the scaling function and wavelet series decomposition of the
signal via lter banks, we have already discussed its schematic illustrations in Section
8.5, doing the needed extension of the given sequence to match the translated scaling
functions in equation (8.51), and nally using the Fast Daubechies wavelet transform
for the signal decomposition in Chapter 9.
426 Chapter 12 WAVELET APPLICATIONS
0
0.05
0
0.05
0.01
0
0.01
0.01
0
0.01
5e3
0
5e3
2e3
0
2e3
5e4
0
5e4
2e4
0
2e4
2e5
0
2e5
5e6
0
5e6
0 0.2 0.4 0.6 0.8 1
5e7
0
5e7
s
a
9
d
9
d
8
d
7
d
6
d
5
d
4
d
3
d
2
d
1
Decomposition Daubechies 3 at level 9
Fig. 12.5: The Daubechies 3 wavelet decomposition for the function in equation (12.3), at the scale

9
=
1
512
.
12.1 ONE-DIMENSIONAL WAVELET APPLICATIONS 435
We remember in denoising it is assumed that the noise is with higher frequencies
and smaller amplitude, and it is superimposed on the signal. This is the case, as shown
in Figure 12.11. Thus, in each approximation we set a threshold to remove the smallest
coefcients at the high frequencies. In the MATLABWavelet Toolbox this thresholding
can be done automatically with different techniques. For the analysis the Daubechies 4
wavelet is used at the level 5. The use of db
4
is explained by the need of a wider wavelet
to cover the generally wide shape of the present original signal. The result is shown in
Figure 12.12. In the above example, using db
4
, we lost some high frequencies of the
original signal after doing the thresholding for the denoising. It is suggested that the
use of wavelet packets, which are not covered in this book, will do better in not losing
too much of the high frequencies.
12.1.4 Wavelets for Non-stationary Waves
Let us start our discussion rst with an example of a stationary wave.
(a) Stationary Waves
In this application the wavelet decomposition of a signal works to identify pure fre-
quencies, much like Fourier analysis does.
Let us consider a signal f(t), which is a superposition of three pure sine waves with
large, medium, and small periods of 200, 20, and 2 ms,
f(t) = sin
2
200
t + sin
2
20
t + sin
2
2
t, (12.5)
as shown in Figure 12.13. In general, we expect the trend (or the average) shape of the
slow one to appear in a reasonably high level of the scaling functions coefcients, such
as a
5
. The third and fastest one is expected to be at the highest resolution (smallest
scale) of the rst level of the decomposition d
1
.
0 200 400 600 800 1000
3
2
1
0
1
2
3
t
Fig. 12.13: The signal of equation (12.5).
438 Chapter 12 WAVELET APPLICATIONS
(b) Non-stationary Waves - Detecting Discontinuities in the Frequency
In this example we consider a simple case of a non-stationary wave where the sine wave
has a very low frequency of 0.005 (period 200) on the time interval (0, 500) in ms,
which changes abruptly to frequency 0.05 (period 20) on the interval (500, 1000),
g(t) =
_
_
_
sin
2
200
t, 0 t < 500
sin
2
20
t, 500 t 1000.
(12.7)
The signal is shown in Figure 12.15. This example contrasts the last one of the stationary
signal in (12.5), with the three different frequencies present together.
0 200 400 600 800 1000
1
0.5
0
0.5
1
t
Fig. 12.15: The non-stationary signal in (12.7).
As we mentioned before, for the corresponding stationary wave, f(t) = sin
2
200
t +
sin
2
20
t, < t < , and the above non-stationary one, we expect almost the same
rst two frequencies of
2
200
and
2
20
when using a Fourier approach. This is in the sense
that for g(t) Fourier analysis cannot tell the time intervals on which the two different
frequencies occur. With the absence of this information, we cannot tell the instant at
which the signal frequency changed.
In wavelet analysis, we expect from the wavelet coefcients to locate the discontinu-
ity or the breakdown in the frequency of the signal g(t), which is located at t
1
= 500 ms.
For such an abrupt change the Haar wavelet may be sufcient at the detailed levels
d
1
or d
2
. Such a decomposition is shown in Figure 12.16, at a scale level 5. In this
case, we see the discontinuity very clearly in d
1
. In Figure 12.17 we use the Daubechies
3 wavelet, as discussed in the following part(c).
444 Chapter 12 WAVELET APPLICATIONS
analysis. By the Parseval equality (6.32a),
1
2a
_
a
a
|f(t)|
2
dt =

n
|c
n
|
2
(12.8)
for the Fourier series of the signal f(t) on (a, a), we see that most of the energy of
the signal is associated with the low-indexed coefcients (note that lim
n
c
n
= 0).
These coefcients, in general, are associated with low frequencies, while the noise in
the signal is known to be associated with high frequencies. So, looking at the double
wavelet transform of the image, we may delete the high frequency (small scale) coef-
cients, which, as we showed in Chapter 11, are found at the lower right corner of the
transformed array. Since noise is associated with high frequencies, this represents a
method of eliminating most of the noise in the signal, which is called denoising. It
is, of course, of most importance for one-dimensional signals, which we had illustrated
in Section 12.1.
Next, we start with the use of the double Haar wavelet transform for denoising an
image.
12.2.1 Denoising and Data Compression of Images
Example 12.3 Denoising an Image
Consider the array S of (E.1) in Example 11.1,
S =
_

_
9 7 6 2
5 3 4 4
8 2 4 0
6 0 2 2
_

_
. (E.1)
We will start by making this array an example of a noisy image. We note that there is
relative smoothness in the rst two rows, and we see the third row as the one of most
changes, followed by the fourth row and the fourth column. So, let us assume that this
is due to a high frequency noise which we would like to reduce. In other words, we
want smoother rows. Our way to do this is go to the double Haar wavelet transform of
this array in (E.8) of Example 11.1,
W =
_

_
4 1 1 1
1 0 3 1
2 0 0 1
1 0 0 1
_

_
, (E.2)
and replace the lower right submatrix
_
0 1
0 1
_
that is associated with high frequencies
by a zero
_
0 0
0 0
_
submatrix. We denote the following modied transform by W
d
, to
12.2 BASICAPPLICATIONSOFTHE2-D(DOUBLE) WAVELETTRANSFORM- ILLUSTRATIONWITHHAARWAVELETS 445
indicate denoising,
W
d
=
_

_
4 1 1 1
1 0 3 1
2 0 0 0
1 0 0 0
_

_
. (E.3)
Let us rst recall how this double Haar Wavelet transform W in (E.2) of the array
S in (E.1) was obtained using the basic method of Example 11.1. First, a horizontal
followed by a vertical sweep is done on S to give S
HV
, as seen in (E.5) of Example
11.1:
S
HV
=
_

_
6 1 4 1
2 0 0 1
4 3 2 1
1 0 0 1
_

_
. (E.4)
Then the submatrix consisting of the four elements of the top left corners of the four
2 2 submatrices,
_
6 4
4 2
_
, was decomposed again with horizontal-vertical sweeps
to give
_
4 1
1 0
_
, and this result was placed in the top left corner 2 2 submatrix of
W, the double Haar transform of S in (E.2). Next, the four top right corners, the four
bottom left corners, and the four bottom right corners elements of S
HV
in (E.4) made
the three 2 2 submatrices
_
1 1
3 1
_
,
_
2 0
1 0
_
, and
_
0 1
0 1
_
, respectively. They are
then placed at the top right-hand, bottom left-hand, and bottom right-hand corners of
W, as shown in (E.2).
We note here that the decomposition of the rst 2 2 submatrix
_
6 4
4 2
_
to result
in
_
4 1
1 0
_
was very important in showing that its result gives 4 in
_
4 1
1 0
_
, the
average of the array S in (E.1). This is for the transform W in (E.2). For its inverse
we must backtrack these steps
_
4 1
1 0
_

_
6 4
4 2
_
, then return its elements to the
top left corners of S
HV
. This is followed then by returning the elements of the other
three submatrices
_
1 1
3 1
_
,
_
2 0
1 0
_
, and
_
0 1
0 1
_
to their respective positions. The
result of distributing the elements of the four submatrices gives S
HV
. What remains is
to undo the horizontal-vertical sweeps from S
HV
to get S.
This is a summary of the decomposition of the array S to W, then the reconstruction
using the inverse double Haar wavelet transform. What concerns us in denoising an
array S is to assign a zero submatrix to the high frequency 22 submatrix of its double
Haar transform W, i.e., we assign a zero 2 2 submatrix to the bottom right-hand
submatrix of the transform. But in the inverse process the four zero elements of this
submatrix will be distributed to the bottom right-hand corners of the four 2 2 sub-
matrices of the sought S
HV
. This, then would not affect the rst, second, and third
454 Chapter 12 WAVELET APPLICATIONS
is often used in this wavelet pattern recognition application, which we shall adopt for
our illustrations. This applies to the modied coefcients matrix W
p
in equation(12.13).
In addition, weights w
1
and w
2
may be given to u
i
and v
i
, the elements of the top
left submatrix, and u

i
, v

i
, the elements of the two retained diagonals in the sum of
equation(12.15), respectivelly, i.e., d is modied to
d
w
=
I

i=1
w
1
|u
i
v
i
| +
J

i=1
w
2

i
v

. (12.16)
For our example we retain eight coefcients, so we may favor the four coefcients
of the top left submatrix to give them a weight of w
1
= 1, while we give a low priority
in a weight of
1
4
for the four diagonal elements. Thus, we have w
1
= 1, w
2
=
1
4
, I = 4,
and J = 4 in equation (12.16).
The double wavelet transform of a pattern allows a compression by deleting the high
frequency coefcients as described above, which results in a great savings in computing
the distances d for so many patterns. In our illustration we use the weighted distance
measure in equation (12.16).
Example 12.6 Pattern Recognition
If we are to illustrate this wavelet transform method of pattern recognition, where we
have to use at least two patterns along with the image, we cannot afford but to try with
(the very limited) 4 4 array of patterns.
So, we consider an L shape at the left side of the 4 4 array,
S
1
=
_

_
1 0 0 0
1 0 0 0
1 0 0 0
1 1 1 0
_

_
. (E.1)
The two patterns to compare the L shape with are a C,
S
2
=
_

_
1 1 1 0
1 0 0 0
1 0 0 0
1 1 1 0
_

_
, (E.2)
and a U,
S
3
=
_

_
1 0 0 1
1 0 0 1
1 0 0 1
1 1 1 1
_

_
. (E.3)
The method of the pattern recognition starts by nding the double Haar wavelet
transform for the three patterns S
1
, S
2
, and S
3
, as done for S of Example 11.1, to obtain
12.2 BASICAPPLICATIONSOFTHE2-D(DOUBLE) WAVELETTRANSFORM- ILLUSTRATIONWITHHAARWAVELETS 455
their transforms.
Let W
1
, W
2
, and W
3
be the double Haar transforms of S
1
, S
2
, and S
3
, respectively.
We give here W
1
and leave the computations of W
2
and W
3
as an excercise,
W
1
=
_

_
0.375 0.250 0.250 0.125
0.125 0 0 0.125
0.125 0 0 0.125
0.125 0 0 0.125
_

_
.
For W
1
, W
2
, and W
3
, we are to keep only their three top left-hand 22 submatrices and
the diagonals of their bottom left- and top right-hand submatrices to have the following
W
1,p
, W
2,p
, and W
3,p
, respectively:
W
1,p
=
_

_
0.375 0.250 0.250 0
0.125 0 0 0.125
0.125 0 0 0
0 0 0 0
_

_
,
W
2,p
=
_

_
0.500 0.250 0.250 0
0 0 0 0
0 0 0 0
0 0 0 0
_

_
,
and
W
3,p
=
_

_
0.625 0 0 0
0.125 0 0 0.125
0.125 0 0 0
0 0 0 0
_

_
.
Then we use the distance measure in equation (12.16) with w
1
= w
2
= 1 to nd the
distances d
w
between the three, i.e.,
d(W
1,p
, W
2,p
) =
1
2
, d(W
1,p
, W
3,p
) =
3
4
, d(W
2,p
, W
3,p
) = 1.
The MATLAB Wavelet Toolbox was used for these computations. This indicates that
the letters L and C are the closest in patterns among L, U and C.
456 Chapter 12 WAVELET APPLICATIONS
12.2.4 MATLAB for Denoising and Compressing Images
Denoising Images
Fig. 12.18: The Empire State building.
Fig. 12.19: The noised image of the Empire State building.
To be practical in our illustrations, we may choose an example of an image of the
Empire State building, as shown in Figure 12.18. A noise was added to it as shown in
Figure 12.19.

You might also like