Arne Jensen, Anders La Cour-Harbo (Auth.) - Ripples in Mathematics - The Discrete Wavelet Transform-Springer-Verlag Berlin Heidelberg (2001)

Ripples in Mathematics
The Discrete Wavelet Transform

Springer
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
A.Jensen
A.la Cour-Harbo
Ripples in
Mathematics
The Discrete Wavelet
Transform
, Springer
Arne Jensen
Aalborg University
Department of Mathematical Sciences
Fredrik Bajers Vej 7
9220 Aalborg, Denmark
e-mail: matarne@math.auc.dk
Anders1a Cour-Harbo
Aalborg University
Department of Control Engineering
Fredrik Bajers Vej 7C
9220 Aalborg, Denmark
e-mail: alc@control.auc.dk
Library of Congress Cataloging-in-Publication Data

Jensen, A. (Arne), 1950-
Ripples in mathematics: the discrete wavelet transfonn / A. Jensen, A. La Cour-Harbo.
p.em.
Includes bibliographical references and index.
ISBN 3540416625 (softrovcr: alk. paper)
1. Wavelcta (Mathematics) I. La Cour-Harbo, A. (Anders), 1973- n. Title.
QA403.3 .146 2001
5IS'.2433--dc21
2001020907
ISBN 3-540-41662-5 Springer-Verlag Berlin Heidelberg New York
Mathematics Subject Classification (2000): 42-01, 42C40, 65-01, 65T60. 94-01, 94A12
MATLAB" is a registred trademark ofThe MathWorks. Inc.
This work is subject to copyright. All rights are reserved, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks.
Duplication of this publication or parts thereof is permilled only under the provisions of the German
Copyright Law of September 9, 1965, in its current version, and permission for use must always be
obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.
Springer-Verlag Berlin Heidelberg New York

a member of BertelsmannSpringer Science+Business Media GmbH
hllp://www.springer.de
C Springer.VerlagBerlinHeidelberg 2001
The use of general descriptive names, registered names, trademarks etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Cover design: Kiinkel&Lapka, Heidelberg
Typeselling by the authors using a ~macro package
Printed on acid-free paper SPIN 10773914 46/3142ck-54 321 0
Preface
Yet another book on wavelets. There are many books on wavelets available,
written for readers with different backgrounds. But the topic is becoming ever
more important in mainstream signal processing, since the new JPEG2000
standard is based on wavelet techniques. Wavelet techniques are also impor-
tant in the MPEG-4 standard.
So we thought that there might be room for yet another book on wavelets.
This one is limited in scope, since it only covers the discrete wavelet trans-
form, which is central in modern digital signal processing. The presentation
is based on the lifting technique discovered by W. Sweldens in 1994. Due to a
result by I. Daubechies and W. Sweldens from 1996 this approach covers the
same class of discrete wavelet transforms as the one based on two channel
filter banks with perfect reconstruction.
The goal of this book is to enable readers, with modest backgrounds
in mathematics, signal analysis, and programming, to understand wavelet
based techniques in signal analysis, and perhaps to enable them to apply
such methods to real world problems.
The book started as a set of lecture notes, written in Danish, for a group
of teachers of signal analysis at Danish Engineering Colleges. The material
has also been presented to groups of engineers working in industry, and used
in mathematics courses at Aalborg University.
We would like to acknowledge the influence of the work by W. Sweldens
[25, 26] on this book. Without his lifting idea we would not have been able
to write this book. We would also like to acknowledge the influence of the
paper [20] by C. Mulcahy. His idea of introducing the wavelet transform
using a signal with 8 samples appealed very much to us, so we have used it
in Chap. 2 to introduce the wavelet transform, and many times later to give
simple examples illustrating the general ideas. It is surprising how much of
wavelet theory one can explain using such simple examples.
This book is an exposition of existing, and in many cases well-known,
results on wavelet theory. For this reason we have not provided detailed ref-
erences to the contributions of the many authors working in this area. We
acknowledge all their contributions, but defer to the textbooks mentioned in
the last chapter for detailed references.
Tokyo, December 2000 Arne Jensen
Aalborg, December 2000 Anders la Cour-Harbo
Contents
1. Introduction.............................................. 1
1.1 Prerequisites........................................... 1
1.2 Guide to the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Background Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. A First Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 The Example. .. .. . . .. .. .. . . .. . . .. . .. 7
2.2 Generalizations......................................... 10
Exercises 10
3. The Discrete Wavelet Transform via Lifting 11

3.1 The First Example Again. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11
3.2 Definition of Lifting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13
3.3 A Second Example 17
3.4 Lifting in General 19
3.5 DWT in General 21
3.6 Further Examples 23
Exercises 24
4. Analysis of Synthetic Signals 25

4.1 The Haar Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25
4.2 The CDF(2,2) Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31
Exercises 33
5. Interpretation............................................ 37
5.1 The First Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37
5.2 Further Results on the Haar Transform 40
5.3 Interpretation of General DWT 45
Exercises 50
6. Two Dimensional Transforms 51

6.1 One Scale DWT in Two Dimensions. .. .. . . 51
6.2 Interpretation and Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 53
6.3 A 2D Transform Based on Lifting. . . . . . . . . . . . . . . . . . . .. . .. 57
VIII Contents
Exercises 60
1. Lifting and Filters I 61

7.1 Fourier Series and the z-Transform 61
7.2 Lifting in the z-Transform Representation. . . . . . . . . . . . . . . .. 64
7.3 Two Channel Filter Banks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 69
7.4 Orthonormal and Biorthogonal Bases. . . . . . . . . . . . . . . . . . . .. 74
7.5 Two Channel Filter Banks in the Time Domain. . . . . . . . . . .. 76
7.6 Summary of Results on Lifting and Filters. . . . . . . . . . . . . . . .. 79
7.7 Properties of Orthogonal Filters . . . . . . . . . . . . . . . . . . . . . . . . .. 79
7.8 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82
Exercises 86
8. Wavelet Packets... .. .. . . .. . . .. . .. 87
8.1 From Wavelets to Wavelet Packets.. . . . . . . . . . . . . . . . . . . . . .. 87
8.2 Choice of Basis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 90
8.3 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96
Exercises 98
9. The Time-Frequency Plane 99

9.1 Sampling and Frequency Contents. . . . . . . . . . . . . . . . . . . . . . .. 99
9.2 Definition of the Time-Frequency Plane 102
9.3 Wavelet Packets and Frequency Contents 107
9.4 More about Time-Frequency Planes 111
9.5 More Fourier Analysis. The Spectrogram 121
Exercises 125
10. Finite Signals 127

10.1 The Extent of the Boundary Problem 127
10.2 DWT in Matrix Form 130
10.3 Gram-Schmidt Boundary Filters 134
10.4 Periodization 140
10.5 Moment Preserving Boundary Filters 144
Exercises 148
11. Implementation 151

11.1 Introduction to Software 151
11.2 Implementing the Haar Transform Through Lifting 152
11.3 Implementing the DWT Through Lifting 155
11.4 The Real Time Method 160
11.5 Filter Bank Implementation 171
11.6 Construction of Boundary Filters 175
11.7 Wavelet Packet Decomposition 180
11.8 Wavelet Packet Bases 181
11.9 Cost Functions 185
Contents IX
Exercises 185
12. Lifting and Filters II 189

12.1 The Three Basic Representations 189
12.2 From Matrix to Equation Form 190
12.3 From Equation to Filter Form 192
12.4 From Filters to Lifting Steps 193
12.5 Factoring Daubechies 4 into Lifting Steps 202
12.6 Factorizing Coiflet 12 into Lifting Steps 204
Exercises 209
13. Wavelets in Matlab 211

13.1 Multiresolution Analysis 212
13.2 Frequency Properties of the Wavelet Transform 216
13.3 Wavelet Packets Used for Denoising 220
13.4 Best Basis Algorithm 225
13.5 Some Commands in UvLWave 230
Exercises 232
14. Applications and Outlook 233

14.1 Applications 233
14.2 Outlook 235
14.3 Some Web Sites 237
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Index 241
1. Introduction
This book gives an introduction to the discrete wavelet transform, and to

some of its generalizations. The transforms are defined and interpreted. Some
examples of applications are given, and the implementation on the computer
is described in detail. The book is limited to the discrete wavelet transform,
which means that the continuous version of the wavelet transform is not pre-
sented at all. One of the reasons for this choice is the intention that the book
should be accessible to readers with rather modest mathematical prerequi-
sites. Another reason is that for readers with good mathematical prerequisites
there exists a large number of excellent books presenting the continuous (and
often also the discrete) versions of the wavelet transform.
The book is written for at least three different audiences. (i) Students of
electrical engineering that need a background in wavelets in order to under-
stand the current standards in the field. (ii) Electrical engineers working in
industry that need to get some background in wavelets in order to apply these
to their own problems in signal processing. (iii) Undergraduate mathematics
students that want to see the power and applicability of modern mathematics
in signal processing.
In this introduction we first describe the prerequisites, then we give a
short guide to the book, and finally we give some background information.
1.1 Prerequisites
The prerequisites for reading this book are quite modest, at least for the first
six chapters. For these chapters familiarity with calculus and linear algebra
will suffice. The numerous American undergraduate texts on calculus and lin-
ear algebra contain more material than is needed. From Chap. 7 onwards we
assume familiarity with either the basic concepts in digital signal processing,
as presented in for example [22, 23] (or any introductory text on digital signal
processing), or with Fourier series. What is needed is the Fourier series, and
the z-transform formulation of Fourier series, together with basic concepts
from filter theory, or, in mathematical terms, elementary results on convolu-
tion of sequences. This chapter is somewhat more difficult to read than the
previous chapters, but the material is essential for a real understanding of
the wavelet transforms.
A. Jensen et al., Ripples in Mathematics

© Springer-Verlag Berlin Heidelberg 2001
2 1. Introduction
The ultimate goal of this book is to enable the reader to use the discrete
wavelet transform on real world problems. For this goal to be realized it is
necessary that the reader carries out experiments on the computer. We have
chosen MATLAB as the environment for computations, since it is particularly
well suited to signal processing. We give many examples and exercises using
MATLAB. A few examples are also given using the C language, but these
are entirely optional. The MATLAB environment is easy to use, so a modest
background in programming will suffice. In Chap. 13 we provide a number
of examples of applications of the various wavelet transforms, based on a
public domain toolbox, so no programming skills are needed to go through
the examples in that chapter.
1.2 Guide to the Book

The reader should first go through Chap. 2 to Chap. 6 without solving the
computer exercises, and then go through the first part of Chap. 13. After that
the reader should return to the first chapters and do the computer exercises.
The first part of the book is based on the so-called lifting technique, which
gives a very easy introduction to the discrete wavelet transform. For the
reader with some previous knowledge of the wavelet transform we give some
background information on the lifting technique in the next section.
In Chap. 7 we establish the connection between the lifting technique and
the more usual filter bank approach to the wavelet transform. The proof and
the detailed discussion of the main result is postponed to Chap. 12.
In Chap. 8 we define the generalization of the wavelet transform called
wavelet packets. This leads to a very large number of possible representations
of a given signal, but fortunately there is a fast search algorithm associated
with wavelet packets. In Chap. 9 we interpret the transforms in time and
frequency, and for this purpose we introduce the time-frequency plane. One
should note that the interpretation of wavelet packet transforms is not easy.
Computer experiments can help the reader to understand the properties of
this class of transforms. The rather complicated behavior with respect to
time and frequency is on the other hand one of the reasons why wavelets and
wavelet packets have been so successful in applications to data compression
and denoising of signals.
Up to this point we have not dealt with an essential problem in the theory,
and in particular in the applications. Everything presented in the previous
chapters works without problems, when applied to infinitely long signals. But
in the real world we always deal with finite length signals. There are problems
at the beginning, and at the end, of a finite signal, when one wants to carry
out a wavelet analysis of such a signal. We refer to this as the boundary
problem. In Chap. 10 we present several solutions to this boundary problem.
There is no universal solution. One has to choose a boundary correction
method adapted to the class of signals under consideration.
1.3 Background Information 3
In Chap. 11 we show in detail how to implement wavelet transforms and

wavelet packet transforms in the MATLAB environment. Several different
approaches are discussed. Some examples of C code implementations are also
given. These are optional. In Chap. 12 we complete the results in Chap. 7 on
filters and lifting steps.
In Chap. 13 we use MATLAB to demonstrate some of the capabilities
of wavelets applied to various signals. This chapter is based on the public
domain toolbox called Uvi_ Wave. At this point the reader should begin to
appreciate the advantages of the wavelet transforms in dealing with signals
with transients. After this chapter the reader should review the previous
chapters and do further experiments on the computer.
The last chapter contains an overview of some applications of wavelets.
We have chosen not to give detailed presentations, since each application has
specific prerequisites, quite different from those assumed in the preceding
chapters. Instead we give references to the literature, and to web sites with
relevant information. The chapter also contains suggestions on how to learn
more about wavelets. This book covers only a small part of the by now
huge wavelet theory. There are a few appendices containing supplementary
material. Some additional material, and the relevant MATLAB M-files, are
available electronically, at the VRL
http://www.bigfoot.com/-alch/ripples.html
Finally, at the end of the book we give some references. There are references
to a few of the numerous books on wavelets and to some research papers. The
latter are included in order to acknowledge some sources. They are probably
inaccessible to most readers of this book.
1.3 Background Information

In this section we assume that the reader has some familiarity with the usual
presentations of wavelet theory, as for example given in [5]. Readers without
this background should go directly to Chap. 2.
We will here try to explain how our approach differs from the most com-
mon ones in the current wavelet literature. We will do this by sketching the
development of wavelets. Our description is very short and incomplete. A
good description of the history of wavelets is given in [13]. Wavelet analysis
started with the work by A. Grossmann and J. Morlet in the beginning of the
eighties. J. Morlet, working for a French oil company, devised a method for
analyzing transient seismic signals, based on an analogy with the windowed
Fourier transform (Gabor analysis). He replaced the window function by a
i:
function 1/J, well localized in time and frequency (for example a Gaussian),
and replaced translation in frequency by scaling. The transform is defined as
CWT(f; a, b) = f(t)a- 1 / 2 1fi(a- 1 (t - b))dt .

4 1. Introduction
Under some additional conditions on 'l/J the transform is invertible. This trans-
form turned out the be better than Fourier analysis in handling transient
signals. The two authors gave the name 'ondelette', in English 'wavelet,' to
the analyzing function 'l/J. Connections to quantum mechanics were also es-
tablished in the early papers.
It turned out that this continuous wavelet transform was not that easy to
apply. In 1985 Y. Meyer discovered that by using certain discrete values of
the two parameters a, b, one could get an orthonormal basis for the Hilbert
space L 2 (R). More precisely, the basis is of the form
The first constructions of such 'l/J were difficult.

The underlying mathematical structure was discovered by S. Mallat and
Y. Meyer in 1987. This structure is called a multiresolution analysis. Combin-
ing ideas from Fourier analysis with ideas from signal processing (two channel
filter banks) and vision (pyramidal algorithms) this leads to a characteriza-
tion of functions 'l/J, which generate a wavelet basis. At the same time this
framework establishes a close connection between wavelets and two channel
filter banks with perfect reconstruction. Another result obtained by S. Mal-
lat was a fast algorithm for the computation of the coefficients for certain
decompositions in a wavelet basis.
In 1988 1. Daubechies used the connection with filter theory to construct
a family of wavelets, with compact support, and with differentiability to a
prescribed finite order. Infinitely often differentiable wavelets with compact
support do not exist.
From this point onwards the wavelet theory and its applications under-
went a very fast development. We will only mention one important event.
In a paper [25], which appeared as a preprint in 1994, W. Sweldens intro-
duced a method called the 'lifting technique,' which allowed one to improve
properties of existing wavelet transforms. 1. Daubechies and W. Sweldens [7]
proved that all finite filters related to wavelets can be obtained using the lift-
ing technique. The lifting technique has many advantages, and it is now part
of mainstream signal analysis. For example, the new JPEG2000 standard is
based on the lifting technique, and the lifting technique is also part of the
MPEG-4 standard.
The main impact of the result by 1. Daubechies and W. Sweldens, in
relation to this book, is that one can start from the lifting technique and use
it to give a direct and simple definition of the discrete wavelet transform.
This is precisely what we have done, and this is how our approach differs
from the more usual ones.
If one wants to go on and study the wavelet bases in L 2 (R), then one
faces the problem that not all discrete wavelet transforms lead to bases.
But there are two complete characterization available, one in the work by
A. Cohen, see for example [4], and a different one in the work by W. Lawton,
1.3 Background Information 5
see for example [5, 15]. From this point onwards the mathematics becomes
highly nontrivial. We choose to stop our exposition here. The reader will have
to have the necessary mathematical background to continue, and with that
background there is a large number of excellent books with which to continue.
2. A First Example
In this chapter we introduce the discrete wavelet transform, often referred to

as DWT, through a simple example, which will reveal some of its essential
features. This idea is due to C. Mulcahy [20], and we use his example, with
a minor modification.
2.1 The Example

The first example is very simple. We take a digital signal consisting of just 8
samples,
56, 40, 8, 24, 48, 48, 40, 16.
We display these numbers in the first row of Table 2.1. We assume that
these numbers are not random, but contain some structures that we want to
extract. We could for example assume that there is some correlation between
a number and its immediate successor, so we take the numbers in pairs and
compute the mean, and the difference between the first member of the pair
and the computed mean. The second row contains the four means followed by
the four differences, the latter being typeset in boldface. We then leave the
four differences unchanged and apply the mean and difference computations
to the first four entries. We repeat this procedure once more. The fourth row
then contains a first entry, which is the mean of the original 8 numbers, and
the 7 calculated differences. The boldface entries in the table are here called
the details of the signal.
Table 2.1. Mean and difference computation. Differences are in boldface type
56 40 8 24 48 48 40 16
48 16 48 28 8 -8 0 12
32 38 16 10 8 -8 0 12
35 -3 16 10 8 -8 0 12
It is important to observe that no information has been lost in this transfor-

mation of the first row into the fourth row. This means that we can reverse

8 2. A First Example
the calculation. Beginning with the last row, we compute the first two entries
in the third row as 32 = 35 + (-3) and 38 = 35 - (-3), Analogously, the first
4 entries in the second row are calculated as 48 = 32 + (16), 16 = 32 - (16),
48 = 38 + (10), and finally 28 = 38 - (10). Repeating this procedure we get
the first row in the table.
Do we gain anything from this change of representation of the signal? In
other words, does the signal in the fourth row exhibit some nice features not
seen in the original signal? One thing is immediately evident. The numbers
in the fourth row are generally smaller than the original numbers. So we have
achieved some kind of loss-free compression by reducing the dynamic range
of the signal. By loss-free we mean that we can transform back to the original
signal, without any loss of information. We could measure the dynamics of
the signal by counting the number of digits used to represent it. The first row
contains 15 digits. The last row contains 12 digits and two negative signs. So
in this example the compression is not very large. But it is easy to give other
examples, where the compression of the dynamic range can be substantial.
We see in this example the pair 48, 48, where the difference of course is
zero. Suppose that after transformation we find that many difference entries
are zero. Then we can store the transformed signal more efficiently by only
storing the non-zero entries (and their locations).
Let us now suppose that we are willing to accept a certain loss of quality
in the signal, if we can get a higher rate of compression. We can try to
process our signal, or better, our transformed signal. One technique is called
thresholding. We choose a threshold and decide to put all entries with an
absolute value less than this threshold equal to zero. Let us in our example
choose 4 as the threshold. This means that we in Table 2.1 replace the entry
-3 by 0 and then perform the reconstruction. The result is in Table 2.2.
Table 2.2. Reconstruction with threshold 4

59 43 11 27 45 45 37 13
51 19 45 25 8 -8 0 12
35 35 16 10 8 -8 0 12
35 0 16 10 8 -8 0 12
The original and transformed signal are both shown in Fig. 2.1. We have
chosen to join the given points by straight line segments to get a good visual-
ization of the signals. Clearly the two graphs differ very little. If presented in
separate plots, it would be difficult to tell them apart. Now let us perform a
more drastic compression. This time we choose the threshold equal to 9. The
computations are given in Table 2.3, and the graphs are plotted in Fig. 2.2.
Notice that the peaks in the original signal have been flattened. We also note
that the signal now is represented by only four non-zero entries.
2.1 The Example 9
50
I
40 1/
"\
\
30
\
\
20 \
\
10
o '--_---L_ _- ' -_ _--'-_ _" - _ - - - - '_ _- ' -_ _-'
1 2 3 4 5 6 7 8
Fig. 2.1. Original signal and modified signal (dashed line) with threshold 4
Table 2.3. Reconstruction with threshold 9

51 51 19 19 45 45 37 13
51 19 45 25 0 0 0 12
35 35 16 10 0 0 0 12
35 0 16 10 0 0 0 12
60 ,-----.----,.-----.--,----,.---,-----,
50
40
30
20
10
OL.-----'----'-------'---...L.-----'----'-------l
1 2 3 4 5 6 7 8
Fig. 2.2. Original signal and modified signal (dashed line) with threshold 9
10 2. A First Example
We note that there are several variations of the procedure used here. We could
have stored averages and differences, or we could have used the difference
between the second element of the pair and the computed average. The first
choice will lead to boldface entries in the tables that can be obtained from
the computed ones by multiplication by a factor -2. The second variant is
obtained by multiplication by -1.
2.2 Generalizations
The above procedure can of course be performed on any signal of length 2N ,

and will lead to a table with N + 1 rows, where the first row is the original
signal. If the given signal has a length different from a power of 2, then we
will have to do some additional operations on the signal to compensate for
that. One possibility is to add samples with value zero to one or both ends of
the signal until a length of 2N is achieved. This is referred to as zero padding.
The transformation performed by successively calculating means and dif-
ferences of a signal is an example of the discrete wavelet transform. It can
be undone by simply reversing the steps performed. We have also seen that
the transformed signal may reveal features not easily seen or detected in the
original signal. All these phenomena are consequences of the properties of the
discrete wavelet transform, as we will see in the following chapters.
Exercises
2.1 Verify the computations in the tables in this chapter.

2.2 Give some other examples, using for example signals of length 16.
2.3 Write some simple functions (in the programming language of your
choice) to perform transformation of signals of length 256 or 512. With these
functions perform some experiments with zero padding.
12 3. DWT and Lifting
parameter n. This also has the advantage that we can denote sequences by
Xl and X2, or in the more detailed notation {xI[n]}nEZ and {x2[n]}nEZ.
The condition for finite energy is for an infinity signal
00
L 2
Ix[nll < 00.
n=-oo
A finite signal always satisfies this condition. In the sequel the reader can
assume that all signals are finite. Note that there are several technicalities
involved in treating infinite sums, but since they are irrelevant for most of
what we want to present here, we will omit these technicalities. The set of
all signals with finite energy is denoted by f2(Z) in the literature. We will
use this convenient notation below. Often we use the mathematical term
sequence instead of . Sometimes we also use the term vector, in particular in
connection with use of results from linear algebra.
Let us now return to the example in Sect. 2.1. We took a pair of numbers
a, b and computed the mean, and the difference between the first entry and
the mean
a+b
s= -2-' (3.1)
d=a-s. (3.2)
The inverse transform is then
a=s+d, (3.3)
b=s-d. (3.4)
As mentioned at the end of Sect. 2.1 we could have chosen another compu-
tation of the difference, as in
a+b
j1.=-2-' (3.5)
c5=b-a. (3.6)
There is an important thing to be noticed here. When we talk about mean

and difference of a pair of samples, as we have done in the previous chapter,
the most obvious calculations are (3.5) and (3.6). And yet we have in Chap. 2
used (3.1) and (3.2) (the same sum, but a different difference). The reason for
choosing this form is the following. Once s has been calculated in (3.1), the
b is no longer needed, since it does not appear in (3.2) (this is in contrast to
(3.6), where both a and b are needed). Thus in a computer the memory space
used to store b can be used to store s. And once din (3.2) has been calculated,
we do not need a anymore. In the computer memory we can therefore also
replace a with d.
3.2 Definition of Lifting 13
First step: a, b --+ a,8

Second step: a,8 --+ d,8
or with the operations indicated explicitly:
First step: a, b --+ a, ~(a + b)

Second step: a,8 --+ a - 8,8 .
Since we do not need extra memory to perform this transform, we refer to

it as an 'in place' transform. The inversion can also be performed 'in place,'
namely as
First step: d,8 --+ a,8

Second step: a,8 --+ a, b
or with the operations given explicitly:
First step: d,8 --+ d + 8,8

Second step: a, 8 --+ a, 28 - a .
The difference between (3.2) and (3.5) might seem trivial and unimportant,
but the replacement of old values with newly calculated ones is nonetheless
one of the key features of the lifting scheme. One can see the importance,
when one considers the memory space needed for transforming very long
signals.
Actually, the computation in (3.5) and (3.6) can also be performed 'in
place'. In this case we should start by computing the difference, as shown
here
a, b --+ a,8 =b- a --+ J-L = a + 8/2,8 . (3.7)
Note that J-L = a+ 8/2 = a+ (b - a)/2 = (a + b)/2 actually is the mean value.
The inversion is performed as
J-L,8 --+ a = J-L - 8/2,8 --+ a, b = a + 8 . (3.8)
One important lesson to be learned from these computations is that essen-
tially the same transform can have different implementations. In this example
the differences are minor, but later we will see examples, where there can be
more substantial differences.
3.2 Definition of Lifting

The transform that uses means and differences, brings us to the definition of
the lifting operation. The two operations, mean and difference, can be viewed
as special cases of more general operations. Remember that we previously (in

the beginning of Chap. 2) assumed that there is some correlation between two
successive samples, and we therefore computed the difference. If two samples
are almost equal the difference is, of course, small, and it is therefore obvious
to think of the first sample as a prediction of the second sample. It is a good
prediction, if the difference is small. We can use other prediction steps than
one based on just the previous sample. Examples are given later.
We also calculated the mean of the two samples. This can be viewed
in two ways. Either as an operation, which preserves some properties of the
original signal (later we shall see how the mean value (and sometimes also the
energy) of a signal is preserved during transformation), or as an extraction of
an essential features of the signal. The latter viewpoint is based on the fact
that the pair-wise mean values contain the overall structure of the signal,
but with only half the number of samples. We use the word update for the
this operation. As with the prediction the update operation can be more
sophisticated than just calculating the mean. An example is given in the
Sect. 3.3.
The prediction and update operations are shown in Fig. 3.1, although the
setup here is a little different from Chap. 2. We start with a finite sequence
j 1
Sj of length 2j . It is transformed into two sequences, each of length 2 - .
They are denoted Sj-l and d j - 1 , respectively. Let us explain the three steps
in detail.
Fig. 3.1. The three steps in a lifting building block. Note that the minus means
'the signal from the left minus the signal from the top'
split The entries are sorted into the even and the odd entries. It is important
to note that we do this only to explain the functionality of the algorithm.
In (effective) implementations the entries are not moved or separated.
prediction If the signal contains some structure, then we can expect corre-
lation between a sample and its nearest neighbors. In our first example
the prediction is that the signal is constant. More elaborately, given the
value at the sample number 2n, we predict that the value at sample 2n+ 1
is the same. We then replace the value at 2n + 1 with the correction to
3.2 Definition of Lifting 15
the prediction, which is the difference. In our notation this is (using the
implementation given in (3.7))
dj - 1 [n] = sj[2n + 1] - sj[2n] .
In general, the idea is to have a prediction procedure P and then compute
d j - 1 = odd j _ 1 - P(evenj_d . (3.9)
Thus in the d signal each entry is one odd sample minus some prediction
based on a number of even samples.
update Given an even entry, we have predicted that the next odd entry
has the same value, and stored the difference. We then update our even
entry to reflect our knowledge of the signal. In the example above we
replaced the even entry by the average. In our notation (and again using
the implementation given in (3.7))
sj_t[n] = sj[2n] + dj-t[n]/2 .
In general we decide on an updating procedure, and then compute
Sj-l = eVenj_l + U(dj-d . (3.10)
The algorithm described here is called one step lifting. It requires the choice
of a prediction procedure P, and an update procedure U.
The discrete wavelet transform is obtained by combining a number of
lifting steps. As in the example in Table 2.1 we keep the computed differences
d j - 1 and use the average sequence Sj-l as input for one more lifting step.
This two step procedure is illustrated in Fig. 3.2.
Fig. 3.2. Two step discrete wavelet transform
Starting with a signal Sj of length 2j and repeating the transformations in the

first example j times, we end up with a single number solO], which is easily
seen to be the mean value of all entries in the original sequence. Taking j = 3
and using the same notation as in the tables in Chap. 2, then we see that
the Table 2.1 is represented symbolically as Table 3.1. Now if we use the
Table 3.1. Notation for Table 2.1

83[0] 83[1] 83[2] 83[3] 83[4] 83[5] 83[6] 83[7]
82[0] 82[1] 82[2] 82[3] d2[O] d2[1] d2[2] d2[3]
81[0] 8Il1] d1 [O] dIll] d2[O] d2[l] d2[2] d2[3]
80[0] do [0] d1 [O] d1 [l] d2[O] d2[l] d2[2] d2[3]
'in place' procedure, and also record the intermediate steps, then we get the
representation in Table 3.2. This table makes it evident that in implementing
Table 3.2. 'In place' representation for Table 3.1 with intermediate steps. Predic-
tion steps are labeled with P, and update steps with U
I~~ ~W ~~ ~~ ~~ ~~ ~~ ~M
83[0] d2[O] 83[2] d2[1] 83[4] d2[2] 83[6] d2[3] P
82[0] d2[0] 82[1] d2[l] 82[2] d2[2] 82[3] d2[3] U
82[0] d2[O] d1 [0] d2[1] 82[2] d2[2] dIll] d2[3] P
8IlO] d2[O] d1 [O] d2[1] 81[1] d2[2] dl[l] d2[3] U
81[0] d2[O] d1 [O] d2[l] do [0] d2[2] dIll] d2[3] P
80[0] d2[0] dl[O] d2[1] do [0] d2[2] dIll] d2[3] U
the procedure on the computer one has to be careful with the indices. For
example, by inspecting the table carefully it is seen that one should step
through the rows in steps of length 2, 4, and 8, while computing the s-values.
We have previously motivated the prediction operation with the reduc-
tion in dynamic range of the signal obtained in using differences rather than
the original values, potentially leading to good compression of a signal. The
update procedure has not yet been clearly motivated. The update performed
in the first example in Chap. 2 was
(3.11)
It turns out that this operation preserves the mean value. The consequence
is that all the s sequences have the same mean value. It is easy to verify in
the case of the example in Table 2.1, since
3.3 A Second Example 17
56 + 40 + 8 + 24 + 48 + 48 + 40 + 16
8
48 + 16 + 48 + 28 32 + 38
-- 4 -- 2 -35
- .
It is not difficult to see that this hold for any s sequence of length 2j . The
mean value of such a sequence is
2 i -1
S = Tj L Sj[n] .
n=O
Substituting (3.11) into this formula we get

2i - 1 _l 2i - 1 _l 2i - l
L Sj-dn ] = ~ L (sj[2n] + sj[2n + 1]) = ~ L sj[k] ,

n=O n=O k=O
which shows the result, since the signal 8j is twice as long as 8j-l' In partic-
ular, solO] equals the mean value of the original samples Sj[O], ... ,sj[2 j - 1]
(which in the first example was 35).
3.3 A Second Example
As mentioned earlier, there are many other possible prediction procedures,

and update procedures. We give a second example. In our first example the
prediction was correct for a constant signal. Now we want the prediction to
be correct for a linear signal. We really mean an affine signal, but we stick to
the commonly used term 'linear.' By a linear signal we mean a signal with the
n-dependence of the form sj[n] = an + {3 (all the samples of the signal lie on
a straight line). For a given odd entry sj[2n+ 1] we base the prediction on the
two nearest even neighbors. The prediction is then ~(sj[2n]+sj[2n+2]),since
we want it to be correct for a linear signal. This value is the open circle in
Fig. 3.3. The correction is the difference between what we predict the middle
sample to be and what it actually is
and this difference is all we need to store. The principle is shown in Fig. 3.3.
We decide to base the procedure on the two most recently computed
differences. We take it of the form
Sj-dn] = sj[2n] + A(dj-dn - 1] + dj-dn]) ,

where A is a constant to be determined. In the first example we had the
property
(3.12)
n n
We would like to have the same property here. Let us first rewrite the ex-
pression for Sj-l [n] above,
Sj-dn] = sj[2n] + Adj-dn - 1] + Adj-dn]

= sj[2n] + A(sj[2n - 1] - ~sj[2n - 2] - ~sj[2n])
+ A(sj[2n + 1] - ~sj[2n] - ~sj[2n + 2]) .
Using this expression, and gathering even and odd terms, we get
n n n
To satisfy (3.12) we must choose A = t. Summarizing, we have the following

two steps
dj-dn] = sj[2n + 1] - ~(sj[2n] + sj[2n + 2]) , (3.13)

sj-dn] = sj[2n] + t(dj-dn - 1] + dj-dn]) . (3.14)
The transform in this example also has the property
(3.15)
n n
We say that the transform preserves the first moment of the sequence. The
average is also called the zeroth moment of the sequence.
In the above presentation we have simplified the notation by not specify-
ing where the finite sequences start and end, thereby for the moment avoiding
keeping track of the ranges of the variables. In other words, we have consid-
ered our finite sequences as infinite, adding zeroes before and after the given
entries. In implementations one has to keep track of these things, but doing
so now would obscure the simplicity of the lifting procedure. In later chapters
we will deal with these problems in detail, see in particular Chap. 10.
sj[2n + 1].
dj_1[n]
Fig. 3.3. The linear prediction

3.4 Lifting in General 19
3.4 Lifting in General
We now look at the lifting procedure in general. Let us first look at how
we can invert the lifting procedure. It is done by reversing the arrows and
changing the signs. Thus the direct transform
d j - 1 = oddj - 1 - P(evenj-d
Sj-l = eVenj_l + U(dj-d
is inverted by the steps
eVenj_l = Sj-l - U(d j - 1 )

oddj _ 1 = d j - 1 + P(evenj_d .
These steps are illustrated in Fig. 3.4. The last step, where the sequences
eVenj_l and oddj_l are merged to form the sequence Sj, is given to explain
the algorithm. It is not performed in implementations, since the entries are
not reordered. As an example, the inverse transform of (3.13) and (3.14) is
sj[2n] = sj-dn]- t(dj-dn - 1] + dj-dn]) , (3.16)

sj[2n + 1] = dj - 1 [n] + ~(sj[2n] + sj[2n + 2]) . (3.17)
Looking at Fig. 3.4 once more, we see that the update step is reversed by the
Fig. 3.4. Direct and inverse lifting step
same update step, but with subtraction instead of addition, and vice versa for
the prediction step. Since each step is inverted separately, we can generalize
in two ways. We can add further pairs of prediction and update steps, and we
can add them singly. If we insist in having them in pairs (this is useful in the
theory, see Chap. 12), we can always add an operation of either type which
does nothing. As an illustration Fig. 3.5 shows a direct transform consisting
of three pairs of prediction and update operations.
It turns out that this generalization is crucial in applications. There are
many important transforms, where the steps do not occur in pairs. Here is
an example, where there is a U operation followed by a P operation and
Fig. 3.5. Three lifting steps
another U operation. Furthermore, in the last two steps, in (3.21) and (3.22),
we add a new type of operation which is called normalization, or sometimes
rescaling. The resulting algorithm is applied to a signal {Sj [n]}nEZ as follows
s)~dn] = sj[2n] + v'3s j [2n + 1] (3.18)

d)~dn] = sj[2n + 1] - tv'3s)~dn] - t(v'3 - 2)S)~1[n -1] (3.19)
S(2)
)-1
[n] = s(l)
)-1
[n] - d(l)
)-1
[n + 1] (3.20)
V3 -1 (2)
Sj-dn ] = V2 Sj_dn ] (3.21)
dj - 1 [n] = V3+
V2 1 dj(1)_ 1 [n] . (3.22)
Since there is more than one U operation, we have used superscripts on the
sand d signals in order to tell them apart. Note that in the normalization
steps we have
V3-1 V3+1
--·---1
V2 V2 - .
The reason for the normalization will become apparent in the next chapter,
when we start doing computations. The algorithm above is one step in the
discrete wavelet transform based on an important filter, which in the litera-
ture is often called Daubechies 4. The connection with filters will be explained
in Chap. 7.
To find the inverse transform we have to use the prescription given above.
We do the steps in reverse order and with the signs reversed. Thus the normal-
ization is undone by multiplication by the inverse constants etc. The result
is
3.5 DWT in General 21
(1)
dj _1[n] =
J3-I
V'i dj - 1[n] (3.23)
(2)
Sj_1[n] = J3V'i+ 1Sj-1[n] (3.24)
J-1 [n] = S~2)

S(l) J-1 [n] + d~l)
J-1 [n + 1] (3.25)
sj[2n + 1] = dJ~dn] + ~J3sJ~dn] + Hv'3 - 2)sJ~1[n - 1] (3.26)

sj[2n] = sJ~dn] - v'3s j [2n + 1] . (3.27)
This transform illustrates one of the problems that has to be faced in imple-
mentations. For example, to compute dJ~l [0] we need to know SJ~l [0] and
sJ~d-I]. But to compute sJ~d-I] one needs the values sj[-2] and sj[-I],
which are not defined. The easiest solution to this problem is to use zero
padding to get a sample at this index value (zero padding means that all
undefined samples are defined to be 0). There exist other more sophisticated
methods. This is the topic in Chap. 10.
Let us repeat our first example in the above notation. We also add a
normalization step. In this form the transform is known as the Haar transform
in the literature. The direct transform is
dJ~dn] = sj[2n + 1] - sj[2n] (3.28)
J-1 [n] = S.[2n]

s~l) J + !d(l)
2 J-1
[n] (3.29)
Sj-1 [n] = V'iSJ~l [n] (3.30)
. _ 1 (1)
dJ- 1[n] - V'idj-1 [n] (3.31)
and the inverse transform is given by
dJ~l [n] = V2dj - 1[n] (3.32)

(1) 1
sj_dn ] = V'i sj - dn] (3.33)
(1) [ ]
Sj [2n] -_ Sj_1 1 (1) [ ]
n - '2dj-1 n (3.34)
sj[2n + 1] = sj[2n] + dJ~l[n] . (3.35)
We note that this transform can be applied to a signal of length 2j without
using zero padding. It turns out to be the only transform with this property.
3.5 The Discrete Wavelet Transform in General

We now look at the discrete wavelet transform in the general framework
established above. We postpone the boundary correction problem and assume
that we have an infinite signal Sj = {Sj[n]}nEZ' The starting point is a

transform (with a corresponding inverse transform) which takes as input a
sequence Sj and produces as output two sequences Sj-I and d j - I We will
represent such a direct transform by the symbol T a (subscript 'a' stands for
analysis) and the inverse transform by the symbol T s (subscript's' stands
for synthesis). In diagrams they will be represented as in Fig. 3.6. These are
our fundamental building blocks.
Fig. 3.6. Building blocks for DWT
The contents of the T a box could be the direct Haar transform as given
by (3.28)-(3.31), and the contents of the T s box could be the inverse Haar
transform as given by (3.32)-(3.35). Obviously, we must make sure to use the
inverse transform corresponding to the applied direct transform. Otherwise,
the results will be meaningless.
We can now combine these building blocks to get discrete wavelet trans-
forms. We perform the transform over a certain number of scales j, meaning
that we combine j of the building blocks as shown in Fig. 3.2 in the case of
2 scales, and in Fig. 3.7 in the case of 4 scales. In the latter figure we use the
building block representation of the individual steps.
We use the symbol WP) to denote a direct j scale discrete wavelet trans-
form. The inverse is denoted by WsW. The result of the four scale transform
is the transition
W a(4) .. s·J ----'

-r S j-4, d j-4, d j-3, d j-2, d j - I ·
If we apply this four scale discrete wavelet transform to a signal of length

2k , then the lengths on the right hand side are 2k - 4 , 2k - 4 , 2k - 3 , 2k - 2 , and
2k - l , respectively. The sum of these five numbers is 2k , as the reader easily
verifies. The inverse four scale DWT is the transition
The diagram in Fig. 3.8 shows how it is computed. We use the term scale
to describe how many times the building block T a or T s are applied in the
decomposition of a signal. The word originates from the classical wavelet
theory. The reader should note that we later, in Chap. 8, introduce the term
level, in a context more general than the DWT. When this term is applied to
a DWT decomposition, then the level is equal to the scale plus 1. The reader
should not mix up the two terms.
3.6 Further Examples 23
Fig. 3.7. DWT over four scales
Fig. 3.8. Inverse DWT over four scales
3.6 Further Examples

We give some further examples of building blocks that one can use for con-
structing wavelet transforms. The example in Sect. 3.3 is part of a large
family of so-called biorthogonal wavelet transforms. The transform given in
(3.13) and (3.14) is known in the literature as CDF(2,2), since the 'inventors'
of this transform are A. Cohen, 1. Daubechies, and J.-C. Feauveau [2). We
give a larger part of the family below. The first step is in all three cases the
same. The final normalization is also the same.
d}~l[n) = sj[2n + 1)- ~(Sj[2n) + sj[2n + 2)) (3.36)
CDF(2,2) S}~rrn) = sj[2n) + ~(dj-1 [n - 1) + dj-dn)) (3.37)

(1) _ 1
CDF(2,4) Sj_1[n)- sj[2n)- 64 (3dj -dn - 2)- 19dj _ 1[n - 1)
- 19dj _1[n) + 3dj -dn + 1)) (3.38)

CDF(2,6) S}~l[n) = sj[2n)- 5~2(-5dj-dn - 3) + 39dj _1[n - 2)
- 162dj_1[n - 1)- 162dj _1[n)
+ 39dj _1[n + 1)- 5dj-dn + 2)) (3.39)
1 (1)
dj - 1 [n) = -../2dj - 1 [n) , (3.40)
Sj-dn ) = v'2s}~dn) . (3.41)

We have not given the formulas for the inverse transforms. They are obtained
as above by reversing the arrows and changing the signs.
We give one further example of a family of three transforms. We have
taken as an example transforms that start with an update step and a pre-
diction step, which are common to all three. Again, at the end there is a
normalization step.
s)~dn] = sj[2n] - ~Sj[2n - 1] (3.42)
dj(1)
_ 1[n ] -_ Sj [2n + 1] - 81 (9s j(1)_1[n ] + 3s j(1)_1[n + 1]) (3.43)
CDF(3,1) s~2)
)-1
[n] = S~l)
)-1
[n] + ~d~l)
9 )-1 [n] (3.44)
CDF(3,3) (2) [ ] _ (1) [ ]

Sj_1 n - Sj_1 n 1 (3d (1)
+ 36 [
- 1]
j _1 n
+ 16d)~1[n] - 3d)~dn + 1]) (3.45)
CDF(3,5) (2) [ ] _ (1) []

Sj_1 1 (5d (1)_ [n - 2] - 34d (1)_ [n - 1]
n - Sj_1 n - 288 j 1 j 1
- 128d)~1[n] + 34d)~1[n + 1]
- 5d)~1 [n + 2]) (3.46)
.j2 (1)
dj-d n ] = 3dj-dn] (3.47)
_ 3 (2)
Sj-dn] - .j2Sj_1[n] . (3.48)
The above formulas for the CDF(2,x) and CDF(3,x) families have been taken
from the technical report [27]. Further examples can be found there.
Exercises
3.1 Verify that the CDF(2,2) transform, defined in (3.13) and (3.14), pre-
serves the first moment, Le. verify that (3.15) holds.
4. Analysis of Synthetic Signals
The discrete wavelet transform has been introduced in the previous two chap-
ters. The general lifting scheme, as well as some examples of transforms, were
presented, and we have seen one application to a signal with just 8 samples.
In this chapter we will apply the transform to a number of synthetic signals,
in order to gain some experience with the properties of the discrete wavelet
transform. We will process some signals by transformation, followed by some
alteration, followed by inverse transformation, as we did in Chap. 2 to the
signal with 8 samples. Here we use significantly longer signals. As an exam-
ple, we will show how this approach can be used to remove some of the noise
in a signal. We will also give an example showing how to separate slow and
fast variations in a signal.
The computations in this chapter have been performed using MATLAB.
We have used the toolbox Uvi_ Wave to perform the computations. See
Chap. 14 for further information on software, and Chap. 13 for an intro-
duction to MATLAB and Uvi_ Wave. At the end of the chapter we give some
exercises, which one should try after having read Sect. 13.1.
4.1 The Haar Transform

Our first examples are based on the Haar transform. The one scale direct
Haar transform is given by equations (3.28)-(3.31), and its inverse by equa-
tions (3.32)-(3.35). We start with a very simple signal, given as a continuous
signal by the sine function. More precisely, we take the function sin (41l"t) ,
with 0 ::::; t ::::; 1. We now sample this signal at 512 equidistant points in
o : : ; t ::::; 1. This gives us a discrete signal 89. The index 9 comes from the
exponent 512 = 29 , as we described in Chap. 3. This signal is plotted in
Fig. 4.1. We label the entries on the horizontal axis by sample index. Note
that due to the density of the sampling the graph looks like the graph of the
continuous function.
We want to perform a wavelet transform of this discrete signal. We choose
to do this over three scales. If we order the entries in the transformed signal
as in Table 2.1, then we get the result shown in Fig. 4.2. The ordering of the
entries is 86, d 6 , d 7 , dg. At each index point we have plotted a vertical line of
length equal to the value of the coefficient. It is not immediately obvious how

26 4. Analysis of Synthetic Signals
-0.5
-1
100 200 300 400 500

Fig. 4.1. The signal sin( 47ft), 512 samples
3.-----.----.----......------,.-------"
0 ,....- ... ... - - - - - - - - - - - - - - · -

-1
-2
-3 L -_ _- ' -_ _-----''---_ _- ' -_ _-----' --1...J
o 100 200 300 400 500

Fig. 4.2. The wavelet coefficients from the DWT of the signal in Fig. 4.1, using
the Haar transform
4.1 The Haar Transform 27
dB
~:~~~
0.05
50 100 150 200 250
d7 0 11111111 ",,1111111111 111111111'' ' ' '111111111111111111'' ' ' 1111111111 11111111""""111111111
-0.05 "
20 40 60 80 100 120
0.2
d6 0 111111.,11111111111111"11111111111111,,11111111111111"1 11111
-0.2
10 20 30 40 50 60
4
2
56 0 11111111111111"11111111111111.,11111111111111"1111111IIII 111
-2
-4
10 20 30 40 50 60
Fig. 4.3. The wavelet coefficients from Fig. 4.2 divided into scales, from the DWT
of the signal in Fig. 4.1
one should interpret this graph. In Fig. 4.3 we have plotted the four parts
separately. The top plot is of ds, followed by d 7 , d 6 , and 86' Note that each
plot has its own axes, with different units. Again, these plots are not easy to
interpret. We try a third approach.
We take the transformed signal, 86, d 6 , d 7 , ds, and then replace all entries
except one with zeroes, Le. sequences of the appropriate length consisting
entirely of zeroes. For example, we can take 06, d 6 , 07, Os, where Os is a
signal of length 256 = 2s with zero entries. We then invert this signal using
the inverse three scale discrete wavelet transform based on (3.32)-(3.35).
Schematically, it looks like
wP): 89 -+
8'
9
The result 8~ of this inversion is a signal with the property that if it were
transformed with WP) , the result would be precisely the signal 0 6 , d 6 , 0 7 , Os.
Hence 8~ contains all information on the coefficients on the third scale. The
four possible plots are given in Fig. 4.4. The top plot is the inversion of
0 6 ,06,0 7 , d s followed by 0 6 ,0 6 , d 7 , Os, and 06, d 6 , 07, Os, and finally at the
bottom 86,06,07, Os. This representation, where the contributions are sepa-
rated as described, is called the multiresolution representation of the signal
0.02
06,06,07, dB 0
-0.02 .
0.05 r------r-----,r------.---------,---~
o 100 200 300 400 500
Fig. 4.4. DWT of the signal in Fig. 4.1, Haar transform, multiresolution represen-
tation, separate plots
(in this case over three scales). The plots in Fig. 4.4 correspond to our in-
tuition associated with repeated means and differences. The bottom plot in
Fig. 4.4 could also have been obtained by computing the mean of 8 succes-
sive samples, and then replacing each of these 8 samples by their mean value.
Thus each mean value is repeated 8 times in the plot.
H we invert the transform, we will get back the original signal. In Fig. 4.5
we have plotted the inverted signal, and the difference between this signal
and the original signal. We see that the differences are of magnitude 10- 15 ,
corresponding to the precision of the MATLAB calculations.
We have now presented a way of visualizing the effect of a DWT over a
finite number of scales. We will then perform some experiments with synthetic
signals. As a first example we add an impulse to our sine signal. We change
the value at sample number 200 to the value 2. We have plotted the three
scale representation in Fig. 4.6. We see that the impulse can be localized
in the component d s , and in the averaged signal 86 the impulse has almost
disappeared. Here we have used the very simple Haar transform. By using
other transforms one can get better results.
Let us now show how we can reduce noise in a signal by processing it
in the DWT representation. We take again the sine signal plus an impulse,
and add some noise. The signal is given in Fig. 4.7. The multiresolution
representation is given in Fig. 4.8 The objective now is to remove the noise
from the signals. We will try to do this by processing the signal as follows. In
the transformed representation, 86, d 6 , d 7 , ds, we leave unchanged the largest
4.1 The Haar Transform 29
-1
0 100 200 300 400 500
X 10-15
1
0.5
0
-0.5
-1
0 100 200 300 400 500
Fig. 4.5. Top: Inverse DWT of the signal in Fig. 4.1. Bottom: Difference between
inverted and original signal
dB
-2
:1 I :I
d7
-1
:1 I :I
d6 o:~
-0.5
86
-~r=s:z:=s:J
0 100 200 300 400 500
Fig. 4.6. Multiresolution representation of sine plus impulse at 200, Haar transform
10% of the coefficients, and change the remaining 90% to zero. We then apply
the inverse transform to this altered signal. The result is shown in Fig. 4.9.
We see that it is possible to recognize both the impulse and the sine signal,
but the sine signal has undergone considerable changes. The next section
1.5
-1
o 100 200 300 400 500

Fig. 4.1. Sine plus impulse at 200 plus noise
Fig. 4.8. Sine plus impulse at 200 plus noise, Haar transform, multiresolution
representation
4.2 The CDF(2,2) Transform 31
1.5
-1
-1.5
o 100 200 300 400 500
Fig. 4.9. Sine plus impulse at 200 plus noise, reconstruction based on the largest
10% of the coefficients, Haar transform
shows that these results can be improved by choosing a more complicated

transform.
4.2 The CDF(2,2) Transform

We will now perform experiments with the DWT based on the building block
CDF(2,2), as it was defined in Sect. 3.3. We will continue with the noise
reduction example from the previous section. In Fig. 4.10 we have given the
multiresolution representation, using the new building block for the DWT.
In Fig. 4.11 we have shown reconstruction based on the 15% and the 10%
largest coefficients in the transformed signal. The result is much better than
the one obtained using the Haar transform. Let us note that there exists an
extensive theory on noise removal, including very sophisticated applications
of the DWT, but it is beyond the scope of this book.
As a second example we show how to separate fast and slow variations in
a signal. We take the function
log(2 + sin(37l"Vt)), O::;t::;l, (4.1)
and sample its values in 1024 points, at 1/1024, 2/1024, ... , 1024/1024.
Then we change the values at 1/1024,33/1024,65/1024, etc. by adding 2 to
the computed values. This signal has been plotted in Fig. 4.12. We will now
try to separate the slow variation and the sharp peaks in the signal. We take
J~" ~, . ;. o"j ~.~o. •o' ,;.. "0.' 0';0 ..., ,~J

ds
-~~
O~~
-0.5
-~~ o 100 200 300 400 500
Fig. 4.10. Sine plus impulse at 200 plus noise, multiresolution representation,
CDF(2,2), three scales
o
-1
o 100 200 300 400 500
o
-1
o 100 200 300 400 500

Fig. 4.11. Sine plus impulse at 200 plus noise, CDF(2,2), three scales, top recon-
struction based on 15% largest coefficients, bottom based on 10% largest coefficients
Exercises 33
3.5
2.5
1.5
/
v
0.5
"
\
\ V
vV'
"
/
"- L-Y
o
o 0.2 0.4 0.6 0.8
Fig. 4.12. Plot of the function log(2 + sin (37l"Vt) ) plus 2 at points 1/1024, 33/1024,
65/1024, etc.
a multiresolution representation over six scales, as shown in Fig. 4.13. We

see from the bottom graph in the figure that we have succeeded in removing
the sharp peaks in that part of the representation. In Fig. 4.14 we have
plotted this part separately. Except close to the end points of the interval,
this is the slow variation. We subtract this part from the original signal and
obtain Fig. 4.15. In these two figures we have used the variable t on the
horizontal axis. Figure 4.15 shows that except for problems at the edges we
have succeeded in isolating the rapid variations, without broadening the sharp
peaks in the rapidly varying part. This example is not only of theoretical
interest, but can also be applied to for example ECG signals.
Exercises
All exercises below require access to MATLAB and Uvi_ Wave (or some other
wavelet toolbox), and some knowledge of their use. You should read Sect. 13.1
before trying to solve these exercises.
4.1 Go through the examples in this chapter, using MATLAB and Uvi_ Wave.
4.2 Carry out experiments on the computer with noise reduction. Vary the
number of coefficients retained, and plot the different reconstructions. Discuss
the results.
4.3 Find the multiresolution representation of a chirp (Le. a signal obtained
from sampling sin(t 2 )).
-~-
-~-
O~~
- 0 . 5 =
O~~
- 0 . 5 =
O.~~
-0.5
_:.~E : Jj
::~~-----'-:
o
~ 200 400 600 800 1000
Fig. 4.13. Multiresolution representation of the signal from Fig. 4.12, six scales,
CDF(2,2)
4.4 Find the multiresolution representation of a signal obtained by sampling

the function
Sin(41l't) for 0 ~ t < t '
f(t) = 1 +sin(41l't) for t ~ t < ~,
{
sin (41l't) for ~ ~ t ~ 1 .
Add noise, and tryout noise removal, using both the Haar transform, and
the CDF(2,2) transform.
4.5 (For readers with sufficient background in signal analysis.) Try to sep-
arate the low and high frequencies in the signal in Fig. 4.12 by a low pass
filtering (use for example a low order Butterworth filter). Compare the re-
sult to Fig. 4.14. Subtract the low pass filtered signal from the original and
compare the result to Fig. 4.15.
Exercises 35
1.4 ....-------,----,------,-----r-----,
1.2
0.2 0.4 0.6 0.8
Fig. 4.14. Bottom graph in Fig. 4.13, fitted to 0 ~ t ~ 1
2.5
1.5
0.5
/
o \ /
-0.5
o 0.2 0.4 0.6 0.8
Fig. 4.15. Signal from Fig. 4.12, with slow variations removed
5. Interpretation
In this chapter we start with an interpretation of the discrete wavelet trans-

form based on the Haar building block. Then we will give interpretations of
wavelet transforms based on more general building blocks. The last part of
this chapter can be omitted on a first reading.
Our presentation in this chapter is incomplete. We state some results from
the general theory, and illustrate them with explicit computations. But we
do not discuss the general theory in detail, since this requires a mathematical
background that we do not assume our readers possess.
Understanding the results in this chapter will be much easier, if one carries
out extensive computer experiments, as we suggest in the exercises at the end
of the chapter. The necessary background and explanations can be found in
Sect. 13.1. It can be read now, since it does not depend on the following
chapters.
5.1 The First Example

We go back to the first example, discussed in Chap. 2. Again we begin with
a signal of length 8 = 23 . This means that we can work at up to three scales.
For the moment we order the entries in the transforms as in Table 2.1. The
goal is to give an interpretation of the transformed signal, i.e. the last row in
Table 2.1. What does this transformed signal reveal about the given signal?
How can we interpret these numbers? One way to answer these questions
is to start with the bottom row, consisting of zeroes except at one entry,
and then inversely transform this signal. In other words, we find the signal
whose transform consists of zeroes except for a 1 at one entry. We keep the
ordering So, do, d 1 , d 2 • The results of the first three computations are given
in Tables 5.1-5.3. The remaining cases are left as exercises for the reader.
Note that we continue with the conventions from Chap. 2, in other words,
we omit the normalization steps given in equations (3.30) and (3.31). This
does not change the interpretation, and the tables below become easier to
understand. Later we will explain why normalization is needed.
The results of these eight computations can be represented using notation
from linear algebra. A signal of length 8 is a vector in the vector space R 8 .
The process described above of reconstructing the signal, whose transform is

38 5. Interpretation
Table 5.1. Reconstruction based on [1,0,0,0,0,0,0,0]

1 1 1 1 1 1 1 1
1 1 1 1 0 0 0 0
1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
Table 5.2. Reconstruction based on [0, 1,0,0,0,0,0,0)

1 1 1 1 -1 -1 -1 -1
1 1 -1 -1 0 0 0 0
1 -1 0 0 0 0 0 0
° 1 0 0 0 0 0 0
Table 5.3. Reconstruction based on [0,0,1,0,0,0,0,0)

1 1 -1 -1
°°°°
1 -1
° ° 0 0 0 0
° ° 1 0 0 0 0 0
° 0 1 0 0 0 0 0
one of the canonical basis vectors in R 8 , is the same as finding the columns
in the matrix (with respect to the canonical basis) of the three scale synthesis
transform WP). This matrix, which we denote by 3
), is shown in (5.1). wi
1 1 1 0 1 0 0 0
1 1 1 0-1 0 0 0
1 1 -1 0 0 1 0 0
W(3) = 1 1 -1 0 0-1 0 0
s (5.1)
1 -1 0 1 0 0 1 0
1 -1 0 1 0 0-1 0
1 -1 0-1 0 0 0 1
1 -1 0-1 0 0 0-1
The first row in Table 5.1 is the transpose of the first column in (5.1), and
so on. Applying this matrix to any length 8 signal performs the inverse Haar
transform. For example, multiplying it with the fourth row of Table 2.3 on
p. 9 (regarded as a column vector) produces the first row of that same table.
The matrix of the direct three scale transform is obtained by computing
the transforms of the eight canonical basis vectors in R 8 . In other words,
we start with the signal [1,0,0,0,0,0,0,0] and carry out the transform as
shown in Table 5.4, and analogously for the remaining seven basis vectors.
The result is the matrix of the direct, or analysis, transform.
5.1 The First Example 39
Table 5.4. Direct transform of first basis vector

1 0 0 0 0 0 0 0
1 1
2 0 0 0 2 0 0 0
1 1 1
4" 0 4" 0 2 0 0 0
1 1 1 1
8 8 4" 0 2 0 0 0
1 1 1 1 1 1 1 1
8 8 8 8 8 8 8 8
1 1 1 1 1 1 1 1
8 8 8 8 -8 -8 -8 -8
° °° °
1 1 1 1
4 4 -4-4
° ° °°
1 1 1 1
W(3)-
a - 4 4 -4-4 (5.2)
°°°°°°
1 1
2 -2
° ° ° °° °
1 1
2 -2
° ° ° ° °°
1 1
2 -2
° °wi°° ° =°
1 1
2 -2
Multiplying the matrices we find 3) . W~3) I and W~3) . WP) = I,

where I denotes the 8 x 8 unit matrix. This is the linear algebra formulation
of perfect reconstruction, or of the invertibility of the three scale transform.
It is clear that analogous constructions can be carried out for signals of
length 2i and transforms to all scales k, k = 1, ... ,j. The linear algebra point
of view is useful in understanding the theory, but if one is trying to carry out
numerical computations, then it is a bad idea to use the matrix formulation.
The direct k scale wavelet transform using the lifting steps requires a number
of operations (additions, multiplications, etc.) on the computer, which is pro-
portional to the length L of the signal. If we perform the transform using its
matrix, then in general a number of operations proportional to L 2 is needed.
We can learn one more thing from this example. Let us look at the direct
transforms
Ta : [1,0,0,0,0,0,0,0] -+ [~,~, t,o, ~,O,O,O],

and
Ta : [0,0,0,0,1,0,0,0] -+ [~, -~, 0, t, 0, O,~, 0] .
The second signal is the first signal translated four units in time, but the
transforms look rather different. Actually, this is one of the reasons why
wavelet transforms can be used to localize events in time, as illustrated in
some of the simple examples in Chap. 4. Readers familiar with Fourier anal-
ysis will see that the wavelet transform is quite different from the Fourier
transform with respect to translations in time.
5.2 Further Results on the Haar Transform
In the next two subsections we describe some further results on the Haar
transform. They are of a more advanced nature than the other results in
this chapter, but are needed for a deeper understanding of the DWT. The
remaining parts of this chapter can be omitted on a first reading.
5.2.1 Normalization of the Transform
Let us now explain why we normalize the Haar transform in the steps (3.30)-
(3.31). This explanation requires a bit more mathematics than the rest of the
chapter. We use the space e2 (Z) of signal of finite energy, introduced in
Chap. 3. The energy in a signal s is measured by the quantity
IIs11 2 = L:ls[nlI
2
• (5.3)
nEZ
The square root of the energy, denoted by Ilsll, is called the norm of the
vector s E e2 (Z). We now compute the norms of the signals in the first and
last row of Table 5.1
11[1,1,1,1,1,1,1, 1111 = v'8,

11[1,0,0,0,0,0,0,01l1 = 1.
Recall that we identify finite signals with infinite signals by adding zeroes.
If we carry out the same computation with a signal of length 2N , consisting
of a 1 followed by 2N - 1 zeroes, then we find that the inverse N scale Haar
transform (computed as in Table 5.1) of this signal is a vector of length 2N ,
all of whose entries are ones. This vector has norm 2N /2. This means that the
norm grows exponentially with N. Such growth can easily lead to numerical
instability. It is to avoid such instability that one chooses to normalize the
Haar building blocks. The consequence of the normalization is that we have
IIW~k),haar,norrnxll = Ilxll and IIWs(k),haar,norrnxll = Ilxll

at any scale k, compatible with the length of the signal. This result applies
to both finite length signals and infinite length signals of finite energy, i.e.
for all x E e2 (Z).
It is not always possible to obtain this nice property, but we will at least
require that - after normalization - the norm of the signal, and the norm of
its direct and inverse transforms, have the same order of magnitude. This is
expressed by requiring the existence of constants A, B, A, E, such that
Allxll :::;IITaxll :::; Bllxll , (5.4)

..1llxll :::;IITsxll :::; Ellxll , (5.5)
for all x E [2(Z). All the building blocks given in Chap. 3 have this property.
In particular, the Haar and Daubechies 4 transform have A = B = .Ii = iJ =
1. Note that we here use the generality of our transform. The transforms T a
and T s can be applied to any vector x E [2(Z).
Since the transforms wiN) and Ws(N) are obtained by iterating the build-
ing blocks, similar estimates hold for these transforms, with constants that
may depend on the scale N.
5.2.2 A Special Property of the Haar Transform
In this section we describe a special property of the Haar transform. We

start by looking at the inversion results summarized in Tables 5.1-5.3, and in
the matrix (5.1). We can obtain analogous results using signals of length 2N
and performing inverse N scale Haar transform (not normalized) on the 2 N
signals [1,0, ... ,0], [0,1, ... ,0], ... , [0,0, ... ,1]. For example, we find that
the inverse transform of
[~ is ~, (5.6)
2 N entries 2 N entries
and that the inverse transform of
is [1, 1, . . . , 1, -1, -1, . . . , -1) . (5.7)

~ ' - v - ' " . ' - - -........- - - '
2 N
entries 2 N -1 entries 2 N -1 entries
Finally, we compute as in Table 5.3 that the inverse transform of
[~iS [1,1, ... ,1,-1,-1, ... ,-1,0,0, ... ,0].

'-v-'" . ... ' '-v-'"
(5.8)
N N 2 N 2 N 1
2 entries 2 - entries 2 - entries 2 - entries
There is a pattern in these computations which can be understood as follows.

We can imagine that all these signals come from continuous signals, which
have been sampled. We choose the time interval to be the interval [0,1]. Then
the sampling points are 1· 2- N , 2· 2- N , 3· 2- N , ... , 2N . 2- N . For the first
signal (5.6) we see that it is obtained by sampling the function
ho(t) = 1, t E [0,1] ,
which is independent of N. The second signal (5.7) is obtained from the
function
hI (t) ={ 1, tE [~, !] ,
-1, t E 12,1] ,
again with a function independent of N. The third signal (5.8) is obtained
from sampling the function
1, t E [0, t] ,
h 2 (t) = { -1, t E ]t,~] ,
0, t E 12,1] .
The pattern is described as follows. We define the function
h(t) = { 1, t E [~, ~], (5.9)

-1, tE12,l].
Consider now the following way of writing the positive integers. For n =
1,2,3, ... we write n = k + 2i with j ~ 0 and 0 :S k < 2i . Each integer has
a unique decomposition. Sample computations are shown in Table 5.5.
Table 5.5. Index computation for Haar basis functions

j 0 1 1 2 2 2 2 3 3 3 3 3 ·..
k 0 0 1 0 1 2 3 0 1 2 3 4 ·..
n=k+2] 1 2 3 4 5 6 7 8 9 10 11 12 · ..
The general function (5.9) is then described by
hn(t) = h(2i t - k), t E [0,1], n = 1,2,3, ... ,

with n, j, k related as just described. For a given N, the 2N vectors described
above are obtained by sampling ho,h1 , ... ,h2 N-1' The important thing to
notice is that all functions are obtained from just two different functions.
The function h o, which will be called the scaling function, and the function
h, which will be called the wavelet. The functions hn , n = 1,2,3, ... , are ob-
tained from the single function h by scaling, determined by j, and translation,
determined by k.
The functions defined above are called the Haar (basis) functions. In
Fig. 5.1 we have plotted the first eight functions. These eight functions are
the ones that after sampling lead to the columns in the matrix (5.1).
We should emphasize that the above computations are performed with
the non-normalized transform. If we introduce normalization, then we have
to use the functions
h~orm(t) = 2i / 2 h(2i t - k), t E [0,1], n = 1,2,3, ....

Let us also look at the role of the scaling function ho above. In the example
we transformed a signal of length 8 three scales down. We could have chosen
only to go two scales down. In this case the signal 83 is transformed into 81,
d 1 , and d 2 , of length 2, 2, and 4, respectively. We can perform the inversion
of the eight possible unit vectors, as above. For the 1 entries in d 1 and d 2 the
o o
-1 L-_~-===:...J
o o r
-1 '--- -1 L......._~ ;::;~==-...l
o
-1 L....---i::L.-=-~ ~--.J -1 .....- =.
L......._~ ---.l
o o r
-1 L..- --==-~.......J -1 L--..~~~_ _= - - . J
o 0.25 0.5 0.75 1 o 0.25 0.5 0.75
Fig. 5.1. The first eight Haar functions
Table 5.6. Two scale reconstruction for first two unit vectors
1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1
1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
results are unchanged, as one can see from the first three lines in Table 5.3.
The results for the two cases with ones in 81 are given in Table 5.6 These two
vectors can be obtained by sampling the two functions ho(2t) and ho(2t -1).
In general, one finds for a given signal of length 2N , transformed down a
number of scales k, 1 :::; k :::; N, results analogous to those above. Thus one
gets a basis of R 2N consisting of vectors obtained by sampling scaled and
translated functions h o(2 n t - m) and h(2 n t - m). Here nand m run through
values determined by the scale considered. As mentioned above, h o is called
the scaling function and h the wavelet.
Let us now try to give an interpretation of the direct transform in terms

of these functions. Let us again take N = 3, Le. signals of length 8. The
i,
eight functions h o, ... , h 7 , sampled at ~,~, ... , give the vectors, whose
three scale direct Haar transform are the eight basis vectors [1,0, ... ,0], ... ,
[0,0, .... ,1]. We will here use the notation
eo = [1,0,0,0,0,0,0,0] ,
el = [0,1,0,0,0,0,0,0] ,
e7 = [0,0,0,0,0,0,0,1].
The eight sampled functions will be denoted by h o, ... , h 7 . They are given
by the columns in the matrix (5.1), as before. The transform relationships
can be expressed by the equations
W~3)(hn) =e n , n=0,1, ... ,7,
and
Note that we here have to take the transpose of the vectors ek defined above,
apply the matrix, and then transpose the result to get the row vectors h k •
Now let us take a general signal x of length 8, and let y = W~3) (x) denote
the direct transform. Since both the direct and inverse transforms are linear
(preserve linear combinations), and since
7
Y = Ly[n]e n ,
n=O
we have the relation

7
X = W~3)(y) = Ly[n]hn . (5.10)
n=O
Thus the direct transform W~3) applied to x yields a set of coefficients y,

with the property that the original signal x is represented as a superposition
of the elementary signals h o, ... , h 7 , as shown in (5.10). The weight of each
elementary signal is given by the corresponding transform coefficient y[n].
In the next section we will see that approximately the same pattern can
be found in more general transforms, although it is not so easy to obtain as
in the case of the Haar transform.
5.3 Interpretation of General Discrete Wavelet

Transforms
In this section we give some further examples of the procedure used in the
previous sections, and then state the general result. This section is rather
incomplete, since a complete treatment of these results requires a considerable
mathematical background.
5.3.1 Some Examples
We will start by repeating the computations in Sect. 5.1 using - instead of

the inverse Haar transform - the transform given by (3.23)-(3.27), which we
call the inverse Daubechies 4 transform. We take as an example the vector
[0,0,0,0,0,1,0,0] of length 8 and perform a three scale inverse transform.
The entries are plotted against the points i,~, ... ,
~ on the t-interval [0,1]
in Fig. 5.2. This figure contains very little information. But let us now repeat
4,....-----,-----r--------r----,...------,
o
-1
-2
-3
-4 L...-_ _--J.. --'- ...L..-_ _- J ......J
o 0.2 0.4 0.6 0.8
Fig. 5.2. Inverse Daubechies 4 of [0,0,0,0,0,1,0,0] over three scales, rescaled
the procedure for vectors of length 8,32, 128, and 512, applied to a vector with
a single 1 as its sixth entry. We fit each transform to the interval [0,1). This
requires that we rescale the values of the transform by 2k / 2 , k = 3,5,7,9. The
result is shown in Fig. 5.3. We recall that the inverse Daubechies 4 transform
includes the normalization step. This figure shows that the graphs rapidly
approach a limiting graph, as we increase the length of the vector. This is a
4r----r-----.------,,-----,-----,
Ol----~
-1
-2
-3 '--_ _--'- -'---_ _- - L - ' - -_ _---l
o 0.2 0.4 0.6 0.8
Fig. 5.3. Inverse Daubechies 4 of sixth basis vector, length 8, 32, 128 and 512,
rescaled
result that can be established rigorously, but it is not easy to do so, and it is
way beyond the scope of this book.
One can interpret the limiting function in Fig. 5.3 as a function whose
values, sampled at appropriate points, represent the entries in the inverse
transform of a vector of length 2N , with a single 1 as its sixth entry. For N
just moderately large, say N = 12, this is a very good approximation to the
actual value. See Fig. 5.4 for the result for N = 12, Le. a vector of length
4096.
For all other basis vectors, except [1,0,0, ... ,0], one gets similar results
in the sense that the graph has the same form, but will be scaled and/or
translated. The underlying function is also here called the wavelet. For the
vector [1,0,0, ... ,0] one gets a different graph. The underlying function is
again called the scaling function.
The theory shows that if one chooses to transform a signal of length 2N to
a scale k, then the inverse transform of the unit vectors with ones at places 1
to 2N -k will be approximations to translated copies of the scaling function.
Let us repeat these computations for the inverse transform CDF(2,2) from
Sect. 3.3 (the formulas for the inverse are given in Sect. 3.4), and at the same
°
time illustrate how the wavelet is translated depending on the placement of
the 1 in the otherwise vector. An example is given in Fig. 5.5. The difference
in the graphs in Fig. 5.5 and Fig. 5.4 is striking. It reflects the result that
the Daubechies 4 wavelet has very little regularity (it is not differentiable),
whereas the other wavelet is a piecewise linear function.
4,----,----r--------,------.--------,
01-----,
-1
-2
-3
-4 '---_ _---'- ...L..-._ _- - ' - ....J........_ _---J
o 0.2 0.4 0.6 0.8
Fig. 5.4. Inverse Daubechies 4 of sixth basis vector, length 4096, rescaled. The
result is the Daubechies 4 wavelet
jl~, 1-------,---:------1--.----'---:j
~~l ' 'A , 1

~~l : : ' ,M
o 0.2 0.4 0.6 0.8 1
Fig. 5.5. Inverse CDF(2,2) of three basis vectors of length 64, entry 40, or 50,
or 60, equal to 1 and the remaining entries equal to zero. The result is the same
function (the wavelet) with different translations
Finally, if we try to find the graphs of the scaling function and wavelet un-
derlying the direct CDF(2,2) transform, then we get the graphs in Fig. 5.6
and Fig. 5.7. These functions are quite complicated. These figures have been
generated by taking N = 16 and a transform of k = 12 scales. To generate
the scaling function we have taken the transform of a vector with a one at
place 8. For the wavelet we have taken the one at place 24. It is interesting to
see that while the analysis wavelet and scaling function are very simple func-
tions (we have not shown the scaling function of CDF(2,2), see Exer. 5.6), the
inverse of that same transform (synthesis) have some rather complex wavelet
and scaling functions.
0.15,....---,....-----,,....-----,,-----,----,------,
0.1
0.05
-0.05
-0.1 '--_ _.1...-_ _- ' - -_ _- ' - -_ _- ' -_ _- ' -_ _---'

0.35 0.4 0.45 0.5 0.55 0.6 0.65
Fig. 5.6. Scaling function for CDF(2,2)
5.3.2 The General Case

The above computations may lead us to the conclusion that there are just
two functions underlying the direct transform, and another two functions
underlying the inverse transform, in the sense that if we take sufficiently
long vectors, say 2N , and perform a k scale transform, with k large, then we
get values that are sampled values of one of the underlying functions. More
precisely, inverse transforms of unit vectors with a one in places from 1 to
2N -k yield translated copies of the scaling function. Inverse transforms of
unit vectors with a one in places from 2N -k + 1 to 2N -kH yield translated
copies of the wavelet. Finally, inverse transforms of unit vectors with a one
at places from 2 N - kH + 1 to 2 N yield scaled and translated copies of the
wavelet.
0.15,---.-----.-----,-----.-----.-----,
0.1
0.05
01--------.,..,"" ""''''
-0.05
-0.1 '--_ _.l...-_ _-'---_ _--'--_ _- ' -_ _-L..._ _--'

0.35 0.4 0.45 0.5 0.55 0.6 0.65
Fig. 5.7. Wavelet for CDF(2,2)
As stated above, these results are strictly correct only in a limiting sense,
and they are not easy to establish. There is one further complication which
we have omitted to state clearly. If one performs the procedure above with a
1 close to the start or end of the vector, then there will in general be some
strange effects, depending on how the transform has been implemented. We
refer to these as boundary effects. They depend on how one makes up for
missing samples in computations near the start or end of a finite vector, the
so-called boundary corrections, which will be considered in detail in Chap. 10.
We have already mentioned zero padding as one of the correction methods.
This is what we have used indirectly in plotting for example Fig. 5.7, where
we have taken a vector with a one at place 24, and zeroes everywhere else.
If we try to interpret these results, then we can say that the direct trans-
form resolves the signal into components of the shape given by the scaling
function and the wavelet. More precisely, it is a superposition of these com-
ponents, with weight according to the value of the entry in the transform,
since the basic shapes were based on vectors with entry equal to 1. This is a
generalization of the concrete computation at the end of Sect. 5.2.2.
Readers interested in a rigorous treatment of the interpretation of the
transforms given here, and with the required mathematical background, are
referred to the literature, for example the books by 1. Daubechies [5], S. Mal-
lat [16], and M. Vetterli-J. Kovacevic [28]. Note that these books base their
treatment of the wavelet transforms on the concepts of multiresolution anal-
ysis and filter theory. We have not yet discussed these concepts.
Exercises
Note that the exercises 5.4-5.7 require use of MATLAB and Uvi_ Wave. Post-
pone solving them until after you have read Sect. 13.1. This section is inde-
pendent of the following chapters.
5.1 Carry out the missing five inverse transforms needed to find the matrix
(5.1).
5.2 Carry out the computations leading to (5.2).
5.3 Carry out computations similar to those leading to (5.1), and (5.1), in
order to find the matrices W~l) and W~l) for the one scale Haar DWT applied
to a signal of length 8.
5.4 Carry out the computations leading to Fig. 5.3, as explained in the text.
5.5 Carry out the computations leading to Fig. 5.5. Then try the same for
some of the 61 basis vectors not plotted in this figure.
5.6 Plot the scaling function of CDF(2,2). In MATLAB you can do this using
the functions wspline and wavelet from the Uvi_ Wave toolbox.
5.7 Carry out the computations leading to Fig. 5.6, as explained in the text,
and then those leading to Fig. 5.7.
6. Two Dimensional Transforms
In this chapter we will briefly show how the discrete wavelet transform can be
applied to two dimensional signals, such as images. The 2D wavelet transform
comes in two forms. One which consists of two ID transforms, and one which
is a true 2D transform. The first type is called separable, and the second
nonseparable. We present some results and examples in the separable case,
since it is a straightforward generalization of the results in the one dimen-
sional case. At the end of the chapter we give an example of a nonseparable
2D DWT based on an adaptation of the lifting technique to the 2D case.
In this chapter we will focus solely on grey scale images. Such an image
can be represented by a matrix, where each entry gives the grey scale value of
the corresponding pixel. The purpose of this chapter is therefore to show how
to apply the DWT to a matrix as opposed to a vector, as we did in previous
chapters.
6.1 One Scale DWT in Two Dimensions

We use the notation X = {x[m, n]} to represent a matrix. As an example we
have an 8 x 8 matrix
X[I,I] xlI, 2] xlI, 8]]

x[2,I] x[2,2] x[2,8]
X= .. .. . . . ... .
'
[ .
x[8, 1] x[8, 2] ... x[8,8]
One way of applying the one dimensional technique to this matrix is by
interpreting it as a one dimensional digital signal, simply by concatenating
the rows as shown here
xlI, 1], xlI, 2]'· .. ,x[I, 8], x[2, 1], x[2, 2]'· .. , x[8, 8] .
This yields a signal of length 64. The one dimensional discrete wavelet trans-
form can then be applied to this signal. However, this is usually not a good
approach, since there can be correlation between entries in neighboring rows.
For example, there can be large areas of the image with the same grey scale

52 6. Two Dimensional Transforms
value. These neighboring samples are typically not neighbors in the 1D signal
obtained by concatenating the rows, and hence the transform may not detect
the correlation.
Fortunately, there is a different way of applying the 1D transform to a
matrix. We recall from Chap. 5 that we can represent the wavelet transform
itself as a matrix, see in particular Exer. 5.3. This fact is also discussed in
more details in Sect. 10.2. In the present context we take a matrix W a , which
performs a one scale wavelet transformation, when applied to a column vector.
To simplify the notation we have omitted the superscript (1). We apply this
transform to the first column in the matrix X,
YC[l' 1]] [W[l' 1] w[l, 2] w[l, 8]] [X[l' 1]]

yC[2, 1] w[2, 1] w[2, 2] w[2, 8] x[2, 1]
·· . '
[ · ..
yC[8, 1] w[8, 1] w[8, 2] w[8,8] x[8,1]
The superscript 'c' is an abbreviation for 'column.' The same operation is
performed on the remaining columns. But this is just ordinary matrix multi-
plication. We write the result as
yc = WaX . (6.1)
We then perform the same operation on the rows of yc. This can be done
by first transposing yc, then multiplying it by W a , and finally transposing
again. The transpose of a matrix A is denoted by AT. Thus the result is
yc,r = (Wa(YC)T) T = YCWaT ,
by the usual rules for the transpose of a matrix. We can summarize these
computations in the equation
yc,r = WaXWaT . (6.2)
The superscripts show that we have first transformed columns, and then rows.
But (6.2) shows that the same result is obtained by first transforming rows,
and then columns, since matrix multiplication is associative, (WaX) W aT =
Wa(XW a T ).
We can find the inverse transform by using the rules of matrix computa-
tions. The result is
X = W a-1yc,r(W a-1)T = Wsyc,rWsT . (6.3)
Here W s = W a -1 denotes the synthesis matrix, see Chap. 5, in particular

Exer. 5.3. This gives a fairly simple method for finding the discrete wavelet
transform of a two dimensional signal. As usual, we do not use matrix multi-
plication when implementing this transform. It is much more efficient to do
two one dimensional transforms, implemented as lifting steps.
6.2 Interpretation and Examples 53
Thus in this section and the next we use the above definition of the separable
DWT, and apply it to 2D signals. The properties derived in the previous
chapters still hold, since we just use two 1D transforms. But there are also
new properties related to the fact that we now have a 2D transform. In the
following section we will discuss some of these properties through a number
of examples.
6.2 Interpretation and Examples
We will again start with the Haar transform, in order to find out, how we can
interpret the transformed image. We use the same ordering of the entries in
the transform as in Table 3.1, dividing the transform coefficients into separate
low and high pass parts. After a one scale Haar transform on both rows and
columns, we end up with a matrix that naturally is interpreted as consisting
of four submatrices, as shown in Fig. 6.1.
-
Fig. 6.1. Interpretation of the two dimensional DWT
The notation is consistent with the one used for one dimensional signals. The
lower index j labels the size of the matrix. Thus Sj is a 2j x 2j matrix. The
submatrix SSj-l of size 2j - 1 x 2j - 1 consists of entries that contain means
over both columns and rows. In the part SD j _ 1 we have computed means for
the columns and differences for the rows. The two operations are reversed in
the part DS j - 1 . In the part DD j - 1 we have computed differences for both
rows and columns.
We use this one scale Haar-based transform as the building block for
a multiresolution!two dimensional 2D DWT. We perform a one scale 2D
transform, and then iterate on the averaged part SSj_l, in order to get
the next step. We will illustrate the process on some simple synthetic images.
We start with the image given in Fig. 6.2.
We now perform a one scale 2D Haar transform on this image, and obtain
the results shown in Fig. 6.3. The left hand plot shows the coefficients, and
the right hand plot the inverse transform of each of the four blocks with
the other three blocks equal to zero. The right hand plot is called the two
dimensional multiresolution representation, since it is the 2D analogue of
Fig. 6.2. A synthetic image
the multiresolution representations introduced in Chap. 4. We now change

our one scale transform to the CDF(2,2) transform. Then we repeat the
computations above. The result is shown in Fig. 6.4. Note that this transform
averages over more pixels than the Haar transform, which is clearly seen in
the SS component.
Let us explain in some detail how these figures have been obtained. The
synthetic image in Fig. 6.2 is based on a 64 x 64 matrix. The black pixels
correspond to entries with value 1. All other entries have value zero. The figure
is shown with a border. The border is not part of the image, but is included in
order to be able to distinguish the image from the white background. Borders
are omitted in the following figures.
We have computed the Haar transform as described, and plotted the
transform in Fig. 6.3 on the left hand side. Then we have carried out the
multiresolution computation and plotted the result on the right hand side.
Multiresolutions are computed in two dimensions in analogy with the one di-
mensional case. We select a component of the decomposition, and replace the
other three components by zeroes. Then we compute the inverse transform.
In this way we obtain the four parts of the picture on the right hand side of
Fig. 6.3.
The grey scale-imp has been adjusted, such that white pixels correspond
to the value -1 and black pixels to the value 1. The large medium grey areas
correspond to the value O. The same procedure is repeated in Fig. 6.4, using
the CDF(2,2) transform.
The one step examples clearly show the averaging, and the emphasis of
vertical, horizontal, and diagonal lines, respectively, in the four components.
6.2 Interpretation and Examples 55
Fig. 6.3. One step 2D Haar transform, grey scale adjusted. Left hand plot shows
coefficients, right hand plot the inverse transform of each block, the multiresolution
representation
Fig. 6.4. One step 2D CDF(2,2) transform, grey scale adjusted. Left hand plot
shows coefficients, right hand plot the inverse transform of each block, the mul-
tiresolution representation
We now do a two step CDF(2,2) transform. The result is shown in Fig. 6.5.
This time we have only shown a plot of the coefficients. In order to be able
to see details, the grey scale has been adjusted in each of the blocks of the
transform, such that the largest value in a block corresponds to black and
the smallest value in a block to white.
Let us next look at some more complicated synthetic images. All images
have 128 x 128 pixels. On the left hand side in Fig. 6.6 we have taken an image
with distinct vertical and horizontal lines. Using the Haar building block, and
transforming over three scales, we get the result on the right hand side in
Fig. 6.6. We see that the vertical and horizontal lines are clearly separated
into the respective components of the transform. The diagonal components
Fig. 6.5. CDF(2,2) transform, 2 scales, grey scale adjusted. Plot of the coefficients
contain little information on the line structure, as expected. In this figure the
grey scale has again been adjusted in each of the ten blocks. The plot shows
the coefficients.
We now take the image on the left hand side in Fig. 6.7, with a compli-
cated structure. The Haar transform over three scales is also shown. Here the
directional effects are much less pronounced, as we would expect from the
original image. Again, the plot shows the coefficients.
Fig. 6.6. Left: Synthetic image with vertical and horizontal lines. Right: Haar
transform of image over 3 scales, grey scale adjusted. Plot of the coefficients
6.3 A 2D Transform Based on Lifting 57
Fig. 6.1. Left: Synthetic image with complex structure. Right: Haar transform of
image over 3 scales, grey scale adjusted. Plot of the coefficients
Finally, we try with the real image 'Lena' shown in the left hand image in
Fig. 6.8. This image is often used in the context of image processing. In this
case we have chosen a resolution of 256 x 256 pixels. The decomposition over
two scales is shown on the right hand side, again with the grey scale adjusted
within each block. The plot is of the coefficients.
In Fig. 6.9 we have set the averaged part of the transform equal to zero,
keeping the six detail blocks, and then applied the inverse 2D transform over
the two scales, in order to locate contours in the image. We have adjusted
the grey scale to be able to see the details.
In this section we have experimented with separable 2D DWTs. As the
examples have shown, there is a serious problem with separable transforms,
since they single out horizontal, vertical, and diagonal structures in the given
image. In a complicated image a small rotation of the original could change
the features emphasized drastically.
6.3 A 2D Transform Based on Lifting

We describe briefly an approach leading to a nonseparable two dimensional
discrete wavelet transform. It is based on the lifting ideas from Sect. 3.1.
The starting point is the method we used to introduce the Haar transform,
and also the CDF(2,2) transform, in Sect. 3.2, namely consideration of the
nearest neighbors.
To avoid problems with the boundary, we look at an infinite image, where
the pixels have been labeled by pairs of integers, such that the image is
described by an infinite matrix X = {x[m,n]}(m,n)EZxZ' The key concept is
the nearest neighbor. Each entry x[m, n) has four nearest neighbors, namely
Fig. 6.8. Left: Standard image Lena in 256 x 256 resolution. Right: CDF(2,2)
transform over 2 scales of 'Lena.' Image is grey scale adjusted
Fig. 6.9. Reconstruction based on the detail parts in Fig. 6.8. Image is grey scale
adjusted
the entries x[m+ 1, n], x[m-l, n], x[m, n+ 1], and x[m, n -1]. This naturally
leads to a division of all points in the plane with integer coordinates into two
classes. This division is defined as follows. We select a point as our starting
point, for example the origin (0,0), and color it black. This point has four
nearest neighbors, which we assign the color white. Next we select one of
the white points and color its four nearest neighbors black. One of them, our
starting point, has already been assigned the color black. Continuing in this
manner, we divide the whole lattice Z x Z into two classes of points, called
black and white points. Each point belongs to exactly one class. We have
illustrated the assignment in Fig. 6.10.
6.3 A 2D Transform Based on Lifting 59
0
• 0
• 0
•
• 0
• 0
• 0
0
• 0
• 0
•
• 0
• 0
• 0
0
• 0
• 0
•
Fig. 6.10. Division of the integer lattice into black and white points
Comparing with the one dimensional case, then the black points correspond
to entries in a one dimensional signal with odd indices, and the white points
to those with even indices. We recall from Sect. 3.2 that the first step in the
one dimensional case was to predict the value at an odd indexed entry, and
then replace this entry with the difference, see (3.9). In the two dimensional
case we do exactly the same. We start with the black points. Each black
point value is replaced with the difference between the original value and
the predicted value. This is done for all black points in the first step. In
the second step we go through the white points and update the values here,
based on the just computed values at the black points, see (3.10) in the
one dimensional case. This is the one scale building block. To define a two
dimensional discrete wavelet transform, we keep the computed values at the
black points, and then use the lattice of white points as a new lattice, on
which we perform the two operations in the building block. Notice that the
white points in Fig. 6.10 constitute a square lattice, which is rotated 45°
relative to the original integer lattice, and with a distance of v'2 between
nearest neighbors. This procedure is an 'in place' algorithm, since we work
on just one matrix, successively computing new values and inserting them in
the original matrix.
The transform is inverted exactly as in Sect. 3.2, namely by reversing the
order of the operations and changing the signs.
Let us now make a specific choice of prediction and update procedures.
For a given black point, located at (m, n), we predict that the value at this
point should be the average of the nearest four neighbors. Thus we replace
x[m,n] by
x.[m,n] = x[m,n]
- 41 (x[m -1,n] + x[m + l,n] + x[m,n + 1] + x[m,n -1])
We decide to use an update procedure, which preserves the average value.
Thus the average of the computed values at the white points should equal
half of the average over all the initial values. The factor one half comes from
the fact that there are half as many white values as there are values in the
original image. A simple computation shows that one can obtain this property
by the following choice of update procedure
xo[m,n] = x[m,n]
1
+ 8 (x.[m - 1, n] + x.[m + 1, n] + x.[m, n + 1] + x.[m, n - 1])
The discrete wavelet transform, defined this way, has the property that it
does not exhibit the pronounced directional effects of the one defined in the
first section. In the literature the lattice in Fig. 6.10 is called the quincunx
lattice. It is possible to use other lattices, for example a hexagonal lattice,
and other procedures for prediction and update.
Exercises
These exercises assume that the reader is familiar with the Uvi_ Wave toolbox,
as for example explained in Chap. 13
6.1 Go through the details of the examples in this section, using the
Uvi_ Wave toolbox in MATLAB.
6.2 Select some other images and perform computer experiments similar to
those above.
6.3 (Difficult.) After you have read Chap. 11, try to implement the nonsep-
arable two dimensional wavelet transform described above. Experiment with
it and compare with some of the applications of the separable one based on
the Haar building block, in particular with respect to directional effects.
7. Lifting and Filters I
Our discussion of the discrete wavelet transform has to this point been based
on the time domain alone. We have represented and treated signals as se-
quences of sample points x = {x[n]}nEZ. The index n E Z represents equidis-
tant sampling times, given a choice of time scale. But we will get a better
understanding of the transform, if we also look at it in the frequency domain.
The frequency representation is obtained from the time domain representa-
tion using the Fourier transform. We will refer to standard texts [22, 23] for
the necessary background, but we recall some of the definitions and results
below.
In the previous chapters we have introduced the discrete wavelet trans-
form using the lifting technique. In the literature, for example [5], the dis-
crete wavelet transform is defined using filters. In this chapter we establish
the connection between the definition using filters, and the lifting definition.
We establish the connection both in the time domain, and in the frequency
domain. Further results on the connection between filters and lifting are given
in Chap. 12.
We note that in this chapter we only discuss the properties of a one scale
discrete wavelet transform, as described in Chap. 3. We look at the properties
of multiresolution transforms in Chap. 9.
Some of the results in this chapter are rather technical, but they are
needed in the sequel for a complete discussion of the DWT. Consequently, in
the following chapters we will often refer back to results obtained here. Note
that it is possible to read the following chapters without having absorbed all
the details in this chapter. One can return to this chapter, when the results
are needed.
7.1 Fourier Series and the z- Transform
We first recall some standard results from signal analysis. A finite energy
signal x = {x[n]}nEZ E e2(Z) has an associated Fourier series,
X(e jW ) =L x[n]e- jnw . (7.1)

nEZ

62 7. Lifting and Filters I
The function X(e jW ), given by the sum of this series, is periodic with pe-
riod 2rr, as a function of the frequency variable w. We denote the imaginary
unit by j, Le. j2 = -1, as is customary in the engineering literature. We also
adopt the custom that the upper case letter X denotes the representation of
the sequence x in the frequency domain.
For each x E f2(Z) we have the following result, which is called Parseval's
equation,
(7.2)
This means that the energy of the signal can be computed in the frequency
representation via this formula.
Conversely, given a 2rr-periodic function X(e jW ), which satisfies
(7.3)
then we can find a sequence x E f2(Z), such that (7.1) holds, by using
x[n) = 2-1
2rr 0
2
11" X(ejW)ejnwdw . (7.4)
This is a consequence of the orthogonality relation for exponentials, expressed

as
-1
2rr
1
0
2
11"
e -jmw e jnwdw -_ {a if m ::J n ,
1 if m = n .
(7.5)
We note that there are technical details concerning the type of convergence
for the series (7.1) (convergence in mean), which we have decided to omit
here.
The z-transform is obtained from the Fourier representation (7.1) by sub-
stituting z for ejw . Thus the z-transform of the sequence x is defined as
X(z) =L x[n)z-n . (7.6)

nEZ
Initially this equation is valid only for z e jw , w E R, or expressed in

terms of the complex variable z, for z taking values on the unit circle. But
in many cases we can extend the complex function X(z) to a larger domain
in the complex plane, and use techniques and results from complex analysis.
In particular, for a finite signal (a signal with only finitely many non-zero
entries) the transform X(z) is a polynomial in z and Z-l. Thus it is defined
in the whole complex plane, except perhaps at the origin, where it may have
a pole.
7.1 Fourier Series and the z-Transform 63
The z-transform defined by (7.6) is linear. This means that the z-transform
of w = ax + f3y is W(z) = aX(z) + f3Y(z), where a and f3 are complex
numbers.
Another property of the z-transform is that it transforms convolution of
sequences into multiplication of the corresponding z-transforms. Let x and
y be two sequences from £2(Z). The convolution of x and y is the sequence
w = x * y, defined by
w[nl = (x * y)[nl = L x[n - kly[kl . (7.7)

kEZ
In the z-transform representation this relation becomes

W(z) = X(z)Y(z) . (7.8)
Let us verify this result. We compute as follows, using the definitions. Sums
extend over all integers.
W(z) =L w[nlz-n =L (L x[n - klY[kl)z-n

n n k
=L L x[n - klz-(n-k)y[klz-k
n k
=L (L x[n - klz-(n-k) )y[klz-k

k n
=L X(z)y[klz-k = X(z)Y(z) .
k
The relation (7.8) shows that we have

(7.9)
This result can also be shown directly from the summation definition of
convolution, using a change of variables.
In order for (7.7) to define a sequence w in £2(Z), and for (7.8) to define
a function W(z) satisfying (7.3), we need an additional condition on one
of the sequences x and y, or on one of the functions X(z) and Y(z). One
possibility is to require that X(e jW ) is a bounded function for w E R. A
stronger condition is to require that En Ix[nJi < 00. In many applications
only finitely many entries in x are nonzero, and then both conditions are
obviously satisfied.
Shifting a given signal one time unit left or right can also be implemented
in the z-transform representation. Suppose x = {x[n]} is a given signal. Let
Xleft = {x[n + I]} denote the signal shifted one time unit to the left. Then it
follows from the definition of the z-transform that we have
Xleft(Z) = Lx[n + Ilz- n = Lx[n + Ilz-(n+l)+l = zX(z) . (7.10)
n n
Analogously, let Xright = {x[n - I]} denote the signal shifted one time unit
to the right. Then
Xright(Z) = L x[n - l]z-n = L x[n - l]z-(n-l)-l = Z-l X(z). (7.11)

n n
Two other operations needed below are down sampling and up sampling by
two. Given a sequence x, then this sequence down sampled by two, denoted
by X2-/., is defined in the time domain by
x2-/.[n] = x[2n], nEZ. (7.12)
Described in words, this means that we delete the odd indexed entries in the
given sequence, and then change the indexing. In the z-transform represen-
tation we find the following result (note that in the second sum the terms
with k odd cancel),
= ~ ~ (X[k](Zl/2)-k + x[k]( _Zl/2)-k)

= ~(X(Zl/2) +X(_zl/2)). (7.13)
Given a sequence y, the up sampling operation yields the sequence Y2t, ob-
tained in the time domain by
0 if n is odd ,
Y2t []
n = { y[n/2] if n is even.
(7.14)
This means that we interlace zeroes between the given samples, and then
change the indexing. In the z-transform representation we find, after a change
of summation variable from n to k = n/2,
Y2t (z) = LY2t[n]z-n = Ly[k]z-2k = Y(Z2) . (7.15)

n k
As a final property we mention the uniqueness result for the z-transform

representation. This can be stated as follows. If X(z) = Y(z) for all z on the
unit circle, then x[n] = y[n] for all n E Z. This is a consequence of (7.4).
7.2 Lifting in the z-Transform Representation
We are now ready to show how to implement the one scale DWT, defined via
the lifting technique, in the z-transform representation. The first step was
7.2 Lifting in the z-Transform Representation 65
to split a given signal into its even and odd components. In the z-transform
representation this splitting is obtained by writing
X(z) = X O(Z2) + Z-l X l (Z2) , (7.16)
where
Xo(z) =L x[2n]z-n , (7.17)

n
Xl(z) = LX[2n + l]z-n . (7.18)
n
Using the results from the previous section, we see that X o is obtained from
the original signal by down sampling by two. The component Xl is obtained
from the given signal by first shifting it one time unit to the left, and then
down sample by two. Using the formulas (7.13) and (7.10) we thus have
Xo(z) = ~(X(Zl/2) +X(_zl/2)) , (7.19)

1/2
Xl(z) = _Z_(X(Zl/2) _ X(_Zl/2)) . (7.20)
2
We represent this decomposition by the diagram in Fig. 7.1. The diagram
should be compared with the formulas (7.19) and (7.20), which show the
result of the decomposition, expressed in terms of X(z).
X(z)
Fig. 7.1. Splitting in even and odd components
The inverse operation is obtained by reading equation (7.16) from right to

left. The equation tells us that we can obtain X(z) from Xo(z) and Xl(z)
by first up sampling the two components by 2, then shifting Xl (z2) one time
unit right (by multiplication by Z-l), and finally adding the two components.
We represent this reconstruction by the diagram in Fig. 7.2.
Let us now see how we can implement the prediction step from Sect. 3.2 in
the z-transform representation. The prediction technique was to form a linear
combination of the even entries and then subtract the result from the odd
entry under consideration. The linear combination was formed independently
of the index of the odd sample under consideration, and based only on the
Fig. 7.2. Reconstructing the signal from even and odd components
relative location of the even entries. For example, in the CDF(2,2) transform
the first step in (3.13) can be implemented as X1(z) - T(z)Xo(z), where
T(z) = ~(1 + z). (Recall that T(z)Xo(z) means the convolution t * Xo in
the time domain, which is exactly a linear combination of the even entries
with weights t[n].) Let us verify this result. First we multiply, using the
definition (7.17), and then we change the summation variable in the second
sum, to get
+ z) L x[2n]z-n
1
T(z)Xo(z) = "2 (1
n
= "2 L
1
x[2n]z-n + "2
1
L x[2n]z-n+l
n n
= L 1
"2 (x[2n] + x[2n + 2])z-n .
n
Thus we have
X1(z) - T(z)Xo(z) =L (x[2n + 1] - ~(x[2n] + x[2n + 2])) z-n,
n
which is exactly the z-transform representation of the right hand side

of (3.13). The transition from (XO(z),X1(z)) to (XO(z),X1(z) -T(z)Xo(z))
can be described by matrix multiplication. We have
Xo(z) ] [ [Xo(Z)]
[X1(z) - T(z)Xo(z) - -T(z) 1 X1(z)'
1 0]
An entirely analogous computation (see Exer. 7.2) shows that if we define
S(z) = HI + Z-l), then the update step in (3.14) is implemented in the
z-transform representation as multiplication by the matrix
1 S(Z)]
[o 1 .
The final normalization step in the various transforms in Chap. 3, as for

example given in (3.31) and (3.30), can all be implemented by multiplication
by a matrix of the form
7.2 Lifting in the z-Transform Representation 67
°
where K > is a constant. Note that this particular form depends on an
overall normalization of the transform, as explained in connection with The-
orem 7.3.1.
It is a rather surprising fact that the same simple structure of the lifting
steps used above applies to the general case. In the general case a prediction
step is always given by multiplication by a matrix of the form
and an update step by multiplication by a matrix of the form
Here T(z) and S(z) are both polynomials in z and z-l. Such polynomials
are called Laurent polynomials.
The general one scale DWT described in Chap. 3, with the normalization
step included, is then in the z-transform representation given as a matrix
product (see also Fig. 3.5)
H(z) ° ][10
= [Ko K- 1 SN(Z)] [
1
1
-TN(z) 1
0] ... [1°SI(Z)]
1
[ 1 0].
-T1(z) 1
(7.21)
The order of the factors is determined by the order in which we apply the
various steps. First a prediction step, then an update step, perhaps repeated
N times, and then finally the normalization step. Note that matrix multipli-
cation is non-commutative, i.e. U(z)P(z) =j; P(z)U(z) in general.
An important property of the DWT implemented via lifting steps was the
invertibility of the transform, as illustrated for example in Fig. 3.4. It is easy
to verify that we have
P( z ) - 1 = [1 0]
T(z) 1 an
d U( )-1 _
z -
[1°-S(z)]
1 . (7.22)
Since
by the usual rules of matrix multiplication, we have that the matrix H(z) in
(7.21) is invertible, and its inverse, denoted by G(z), is given by
G(z) _ [ 1 0]
- T 1(z) 1
[1°-SI(Z)]
1
... [ 1 0] [1 -SN(Z)] [K- 0]
TN(Z) 1 0 10K'
1
(7.23)
. - - - - - - - - 1 2.j.. X o(z) Yo(z)
X(z)
H(z)
Fig. 7.3. One scale DWT in the z-representation, based on (7.21)
Multiplying all the matrices in the product defining H(z) in (7.21), we get a
matrix with entries, which are Laurent polynomials. We use the notation
H(z) = [Hoo(Z) HOI (Z)] (7.24)

H lO (z) H ll (z) ,
for such a general matrix. We can then represent the implementation of the
complete one scale DWT in the z-transform representation by the diagram
in Fig. 7.3. Written in matrix notation the DWT is given as
Yo(z)] _ [Hoo(Z) HOI (Z)] [Xo(Z)] (7.25)

[YI (z) - H lO (z) H ll (z) Xl (z) .
This representation of a two channel filter bank, without any reference to

lifting, is in the signal analysis literature called the polyphase representation,
see [28].
The analogous diagram and matrix representation for the inversion are
easily found, and are omitted here.
In the factored form it was easy to see that the matrix H(z) was invertible.
In the form (7.24) invertibility may not be so obvious. But, just as for ordinary
matrices, one can here use the determinant to decide whether a given matrix
H(z) is invertible. The following proposition demonstrates how this is done.
Proposition 7.2.1. A matrix (7.24), whose entries are Laurent polynomi-
als, is invertible, if and only if its determinant is a monomial.
Proof. The determinant is defined by
d(z) = detH(z) = Hoo(z)Hll(z) - HOI (z)H lO (z) , (7.26)
and it is a Laurent polynomial. A straightforward multiplication of matrices

shows that
HOO(Z) HOI(Z)] [Hll(Z) -HQ1(Z)] _ d( )
[H lO (z) Hll(z) -HlO (z) Hoo(z) - z 0 1 '
[1 0] (7.27)
just like in ordinary linear algebra. This equation shows that the matrix H(z)
is invertible, if and only if d(z) is invertible. It also gives an explicit formula
for the inverse matrix.
7.3 Two Channel Filter Banks 69
A Laurent polynomial is invertible, if and only if it is a monomial, Le. it

is of the form cz k for some nonzero complex number c and some integer k.
Let us verify this result. Let p(z) = cz k . Then the inverse is q(z) = c- I z-k.
Conversely, let
p(z) = an1z n1 + anl+IZnl+1 + + an2 z n2 ,
q(z) = bm1z m1 + bml+IZml+1 + + bm2 z m2 ,
be two Laurent polynomials. Here nl :::; n2 and ml :::; m2 are integers, and
the ai and bi are complex numbers. We assume that a n1 # 0, a n2 # 0,
bm1 # 0, and bm2 # O. Suppose now that p(z)q(z) = 1 for all nonzero
complex numbers z. If nl = n2 and ml = m2, both p and q are monomials,
and the result has been shown. So assume for example nl < n2. We first
multiply the two polynomials.
= anlbmlZnl+ml + ... + an2bm2Zn2+m2 .
p(z)q(z)
We have assumed that p(z)q(z) = 1. But in the product we have at least two
different powers of z with nonzero coefficients, namely nl + m2 and n2 + m2,
which is a contradiction. This shows the result.
The formula (7.27) shows that the usual formula for the inverse of a 2 x 2
matrix also holds in this case.
7.3 Two Channel Filter Banks

with Perfect Reconstruction
Let us now present an approach to the one scale DWT, which is common
in the signal analysis literature, see [28]. It is based on the concept of a two
channel filter bank. First we need to introduce the concept of a filter. A filter
is a linear map, which maps a signal with finite energy into another signal
with finite energy. In the time domain it is given by convolution by a vector h.
To preserve the finite energy property we need to assume that the z-transform
of this vector, H(z), is bounded on the unit circle. Convolution by h is called
filtering by h. In the time domain it is given by h * x, and in the frequency
domain by H(z)X(z). The vector h is called the impulse response (IR) of
the filter, (or sometimes the filter taps), and H(e jW ) the transfer function (or
sometimes the frequency response). If h is a finite sequence, then h is called
a FIR filter. Here FIR stands for finite impulse response. An infinite h is then
called an IIR filter. We only consider FIR filters in this book. We also only
consider filters with real coefficients. Further details on filtering can be found
in any book on signal analysis, for example in [22, 23].
A two channel filter bank starts with two analysis filters, denoted by h o
and hI here l , and two synthesis filters, denoted by go and gl. All four filters
I Note that from Chap. 9 and onwards this notation is changed
are assumed to be FIR filters. Usually the filters with index 0 are chosen to
be low pass filters, and the filters with index 1 to be high pass filters. In the
usual terminology a low pass filter is a filter which is close to 1 for Iwl ::; 1r/2,
and close to 0 for 1r/2 ::; Iwl ::; 1r. Similarly a high pass filter is close to 1 for
1r/2::; Iwl ::; 1r, and close to 0 for Iwl ::; 1r/2. In our case the value 1 has to
be replaced by ..;2, due to the manner in which we have chosen to normalize
the transform. We keep this normalization to facilitate comparison with the
literature.
The analysis and synthesis parts of the filter bank are shown in Fig. 704.
For the moment we consider the filtering scheme in the z-transform repre-
sentation. Later we will also look at it in the time domain. The analysis
part transforms the input X(z) to the output pair Yo(z), Y1(z). The synthe-
sis part then transforms this pair to the output X(z). The filtering scheme
is said to have the perfect reconstruction property, if X(z) = X(z) for all
possible (finite energy) X(z).
X(z)
Fig. 7.4. Two channel analysis and synthesis
We first analyze which conditions are needed on the four filters, in order to
obtain the perfect reconstruction property. We perform this analysis in the
z-transform representation. Filtering by h o transforms X(z) to Ho(z)X(z),
and we then use (7.13) to down sample by two. Thus we have
yo(z) = ~ (HO(Zl/2)X(Zl/2) + Ho(_Zl/2)X( _zl/2)) , (7.28)
Y1(z) = ~ (H1(zl/2)X(Zl/2) + H 1(_Zl/2)X(_zl/2)) (7.29)
Up sampling by two (see (7.15)), followed by filtering by the G-filters, and

addition of the results, leads to a reconstructed signal
(7.30)
Perfect reconstruction means that X(z) = X(z). We combine the above ex-
pressions and then regroup terms to get
The condition X(z) = X(z) will then follow from the conditions
Go(z)Ho(z) + G1 (z)H1(z) = 2 , (7.31)

Go(z)Ho(-z) + G1 (z)H1(-z) = O. (7.32)
The converse obviously also holds. These conditions mean that the four filters
cannot be chosen independently, if we want to have perfect reconstruction.
Let us analyze the consequences of (7.31) and (7.32). We write them in
matrix form
Ho(z) H1(Z)] [Go(z)]

[Ho(-z) H1(-z) G1(z) -
[2]0 . (7.33)
In order to solve this equation with respect to Go(z) and G1(z) we need
the matrix to be invertible. Let us denote its determinant by d(z). Since
we assume that all filters are FIR filters, d(z) is a Laurent polynomial. To
be invertible, it has to be a monomial, as shown in Proposition 7.2.1. This
determinant satisfies d( -z) = -d(z), as the following computations show.
d( -z) = Ho(-z)H1(-(-z)) - Ho(-(-(z))H1(-z)

= -(Ho(z)H1(-z) - Ho(-z)H1(z))
= -d(z) .
This means that the monomial d(z) has to be an odd integer power of z, so
we can write it as
(7.34)
for some integer k and some nonzero constant c. Using Cramer's rule to solve
(7.33) we get
2 Hl(Z) I
o H1(-z)
Go(z) = 1
d(z) = cz 2k+1H1
(-z), (7.35)
Ho(z) 21
G (z) = I d(z)
Ho(-z) 0
1
= _cz 2k+1Iio
(-z)· (7.36)
These equations show that we can choose either the H -filter pair or the G-
filter pair. We will assume that we have filters H o and H 1 , subject to the
condition that
(7.37)
for some integer k and nonzero constant c. Then Go and G I are determined
by the equations (7.35) and (7.36), which means that they are unique up to
a scaling factor and an odd shift in time.
In the usual definition of the DWT the starting point is a two channel
filter bank with the perfect reconstruction property. The analysis part is then
used to define the direct one scale DWT (the building block from Sect. 3.2),
and the synthesis part is used for reconstruction.
It is an important result that the filtering approach, and the one based on
lifting, actually are identical. This means that they are just two different ways
of describing the same transformation from X(z) to Yo(z), YI(z). We will now
start explaining this equivalence. Part of the explanation, in particular the
proof of the equivalence, will be postponed to Chap. 12.
The first step is to show that the analysis step in Fig. 7.4 is equivalent
to the analysis step summarized in Fig. 7.3 and in (7.25). Thus we want to
find the equations relating the coefficients in the matrix (7.24) and the filters.
The analysis step by both methods should yield the same result. To avoid
the square root terms we compare the results after up sampling by two. We
start with the equality
YO(Z2)] _ H( 2) [~(X(z) + X( -Z))]
[Y I (Z2) - z ~(X(z)-X(-z))'
where Yo and YI are obtained from the filter bank approach, see (7.28) and
(7.29), and the right hand side from the lifting approach (in the polyphase
form), with H from (7.24). We have also inserted the (up sampled) expressions
(7.19) and (7.20) on the right hand side. The first equation can then be
written as
1
"2 (Ho(z)X(z) + Ho(-z)X(-z)) =
HOO(Z2)~ (X(z) + X( -z)) + HOI (Z2)~ (X(z) - X( -z)) .
This leads to the relation
Ho(z) = HOO (Z2) + zHOI (z2) .
Note the similarity to (7.16). The relation for HI is found analogously, and
then the relations for Go and G I can be found using the perfect reconstruction
conditions (7.35) and (7.36) in the two cases. The relations are summarized
here.
Ho(z)= HOO (z2) + zH01 (z2) , (7.38)
Hdz) = H lO (Z2) + zHl1 (Z2) , (7.39)
I
Go(z) = GOO (Z2) + Z- GOI (z2) , (7.40)
GI(z) = G lO (z2) + Z- IG l1 (z2) . (7.41)
Note the difference in the decomposition of the H-filters and the G-filters.
Thus in the polyphase representation we use
H(z) _ [Hoo(z) H01 (Z)] (7.42)

- H lO (z)Hll (z) ,
G(z) = H(Z)-l = [GOO(z) G lO (z)] . (7.43)
G01(z) Gll (z)
Note the placement of entries in G(z), which differs from the usual notation
for matrices. The requirement of perfect reconstruction in the polyphase for-
mulation was the requirement that G(z) should be the inverse of H(z). It
is possible to verify that invertibility of H(z) is equivalent with the perfect
reconstruction property for the filter bank, see Exer. 7.4.
If we start with a filter bank, then we can easily define H(z) using (7.38)
and (7.39). But to get from H(z) to the lifting implementation, we need to
factor H(z) into lifting steps (and a final normalization step), as in (7.21).
The remarkable result is that this is always possible. This result was ob-
tained by 1. Daubechies and W. Sweldens in the paper [7]. In the general
case det H(z) = cz 2kH , whereas in (7.21) the determinant obviously is equal
to 1. One can always get the determinant equal to one by scaling and an odd
shift in time, so we state the result in this case.
Theorem 7.3.1 (Daubechies and Sweldens). Assume that H(z) is a
there exists a constant K l'

T1(z ), . . . , TN (z ), such that
°and Laurent polynomials Sl (z), ... , SN(Z),
2 x 2 matrix of Laurent polynomials, normalized to detH(z) = 1. Then
H( ) _ [K
z - ° °]°
K-1
[1 SN(Z)] [ 1
1 -TN(z) 1 1 °
0] ... [1 Sl(Z)] [ 1 0]
-T1(z) 1 .
The proof of this theorem is constructive. It gives an algorithm for finding the
Laurent polynomials Sl (z), . .. ,SN(Z), T 1(z), . .. ,TN(z) in the factorization.
It is important to note that the factorization is not unique. Once we have a
factorization, we can translate it into lifting steps. We will give some examples
of this, together with a detailed proof of the theorem, in Chap. 12.
The advantage of the lifting approach, compared to the filter approach,
is that it is very easy to find perfect reconstruction filters Ho, H 1, Go, and
G 1 . It is just a matter of multiplying the lifting steps as in (7.21), and then
assemble the filters according to the equations (7.38)-(7.41). In Sect. 7.8 we
give some examples.
This approach should be contrasted with the traditional signal analysis
approach, where one tries to find (approximate numerical) solutions to the
equations (7.31) and (7.32), using for example spectral factorization. The
weakness in constructing a transform based solely on the lifting technique is
that it is based entirely on considerations in the time domain. Sometimes it
is desirable to design filters with certain properties in the frequency domain,
and once filters have been constructed in the frequency domain, we can use
the constructive proof of the theorem to derive a lifting implementation, as
explained in detail in Chap. 12. Another weakness of the lifting approach
should be mentioned. The numerical stability of transforms defined using
lifting can be difficult to analyze. We will give some further remarks on this
problem in Sect. 14.2.
7.4 Orthonormal and Biorthogonal Bases
The two channel filter banks with perfect reconstruction discussed in the pre-
vious section can also be implemented in the time domain using convolution
by filters. (Actually, this is the way the DWT is implemented in the Uvi_ Wave
toolbox (see Chap. 11), and in most other wavelet software packages.) This
leads to an interpretation in terms of biorthogonal or orthonormal bases in
f2(Z). In this section we define these bases and give a few results on them.
In f2(Z) one defines an inner product for any x,y E f2(Z) by
(x,y) =L xnYn . (7.44)

nEZ
Here xn denotes complex conjugation. The inner product is connected to the

norm via the equation Ilxll = (x, x)1/2. Two vectors x, y E f2(Z) are said to
be orthogonal, if (x, y) = o.
At this point we need to introduce the Kronecker delta. It is the sequence
defined by
8[n] = {1o if n =
ifniO.
0, (7.45)
Sometimes 8[n] is also viewed as a function of n.

Let en, nEZ, be a sequence of vectors from f2(Z). It is said to be an
orthonormal basis for f2(Z), if the following two properties hold.
orthonormality For all m, n E Z we have (em, en) = 8[m - n].

completeness If a vector x E f2 (Z) satisfies (x, en) = 0 for
all
nEZ, then x = 0, where 0 denotes the zero vector.
An orthonormal basis has many nice properties. For example, it gives a rep-
resentation for every vector x E f2(Z), as follows
(7.46)
For the inner product we have

7.4 Orthonormal and Biorthogonal Bases 75
which gives the following expression for the norm squared (the energy in the
signal)
An example of an orthonormal basis is the so-called canonical basis, which is

defined by
en[k] = 8[k - n] . (7.48)
This means that the vector en has a one as the n'th entry and zeroes ev-
erywhere else. It is an exercise to verify that this definition actually gives an
orthonormal basis, see Exer. 7.5.
Sometimes the requirements in the definition of the orthonormal basis are
too restrictive for a particular purpose. In this case biorthogonal bases can
often be used instead. They are defined as follows. Two sequences f n , nEZ,
and fm, m E Z, are said to constitute a biorthogonal pair of bases for f2(Z),
if the following properties are satisfied.
biorthogonality For all m, n E Z we have (f m, fn) = 8[m - n].

stability There exist positive constants A, B, A., and B, such
that for all vectors x E f2(Z) we have
Allxl1 2 ~ I: l(fn,xW ~ BllxW , (7.49)

n
A.llxl1 2 ~ I: l(fn,x)1 2 ~ Bllxl1 2 . (7.50)
n
The expansion in (7.46) is now replaced by the following two expansions
and the expansion of the inner product in (7.47) by
Comparing the definitions, we see that an orthonormal basis is a special case

of a biorthogonal basis pair, namely one satisfying f n = f n for all n E Z. The
completeness property comes from the stability property, since (x, f n ) = 0
for all nand (7.49) imply that x = o.
7.5 Two Channel Filter Banks in the Time Domain

Let us first look at the realization of the two channel filter bank in the time
domain. Note that this is the implementation most often used, since real
signals are given as sequences of samples.
As Fig. 7.4 shows, in the time domain we filter (using convolution with h o
and hI), and then down sample. Obviously the two steps should be combined,
to avoid computing terms not needed. The result is
Yolk] =L ho[2k - n]x[n] =L ho[n]x[2k - n] , (7.51)

n n
(7.52)
n n
We have shown both forms of the convolution, see (7.9).

To get formulas for the inverse transform we look again at Fig. 7.4. In the
time domain we first up sample Yo and YI, and then convolute with go and
gl, respectively. Finally the two results are added. Again it is obvious that
all three operations should be combined. The result is
x[k] = L (go[k - 2n]Yo[n] + gl[k - 2n]ydn]). (7.53)

n
This formulation avoids explicit introduction of zero samples in the up sam-

pled signals. They are eliminated by changing to the variable n/2 in the
summation defining the convolution.
We now look at the perfect reconstruction property of a two channel filter
bank in the time domain. Thus we assume that we have four filters ho, hI, go,
and gl, such that the associated filter bank has the perfect reconstruction
property. In particular, we know that equations (7.31),(7.32), (7.35), and
(7.36) all hold for the z-transforms.
We have to translate these conditions into conditions in the time do-
main on the filter coefficients. These conditions are obtained through a rather
lengthy series of straightforward computations. We start with the four equa-
tions mentioned above. We then derive four new equations below.
Using (7.35) and (7.36) we get
(7.54)
Using (7.31) and (7.54) we get the first equation
Go (z)Ho(z) + Go(-z)Ho(-z) =2 . (7.55)
We observe that H o(-z) is the z-transform ofthe sequence {( -l)nho[n]), and

then we note that the constant function 2 is the z-transform of the sequence
{2l5[n]}. Finally we use the uniqueness property of the z-transform and the
z-transform representation of convolution to write (7.55) as
7.5 Two Channel Filter Banks in the Time Domain 77
L:go[k]ho[n - k] + (-It L:go[k]ho[n - k] = 2c5[n].

k k
Now for n odd the two terms cancel, and for n even they are equal. Thus we
find, replacing n by 2n (note that c5[2n] = c5[n] by the definition),
L: go [k]ho[2n - k] = c5[n] for all n E Z . (7.56)

k
We use (7.54) once more in (7.31), this time with z replaced by -z, to get
the second equation
(7.57)
A computation identical to the one leading to (7.56) yields
L: gdk]h [2n - k] = c5[n]

1 for all n E Z . (7.58)
k
Using (7.35) we get
GO(z)H1(z) = _c- 1 z-2k- 1GO(z)G o(-z) ,

and then replacing z by - z
Adding these expressions we get the third equation
GO(z)H1(z) + Go(-z)H1(-z) = 0 . (7.59)
We can translate this equation into the time domain as above. The result is
L:go[k]h 1 [2n - k] =0 for all n E Z. (7.60)

k
Using (7.36) we get
and
Adding these yields the fourth equation
G 1(z)Ho(z) + G1(-z)Ho(-z) = 0. (7.61)
As above this leads to

L gdk]ho[2n - k] =0 for all n E Z . (7.62)

k
Thus we have shown that the perfect reconstruction property of the filter bank
leads to the four equations (7.56), (7.58), (7.60), and (7.62). It turns out that
these four equations have a natural interpretation as the first biorthogonality
condition in the definition of a biorthogonal basis pair in £2(Z). This can be
seen in the following way. We define the sequences of vectors {fi } and {i\}
by
hn[k] = ho[2n - k], hn+l[k] = hd2n - k] , (7.63)

!2n[k] = go[k - 2n], !2n+dk] = gl[k - 2n] , (7.64)
for all n E Z. Since we assume that the four filters are FIR filters, these
vectors all belong to £2(Z). The four relations above then lead to the following
properties of these vectors, where we also use the definition of the inner
product and the assumption of real filter coefficients.
(fo, f 2n ) = 6[n], (fo, f 2n +l) =0 ,

(f1 ,f2n ) = 0, (f 1 ,f2n +l) = 6[n] ,
for all n E Z. They lead to the biorthogonality condition-imp
(fi , f i ,) = 6[i - i/] . (7.65)
This is seen as follows. There are four cases to consider. Assume for example
that both i and i ' are even. Then we have, using a change of variables,
(f2m , f 2n ) =L go[k - 2m]ho[2n - k]

k
=L go[k']ho[2(n - m) - k']
k'
=6[n-m].
The remaining three cases are obtained by similar computations, see Exer. 7.6.
One may ask whether the stability condition (7.49) also follows from the
perfect reconstruction property. Unfortunately, this is not the case. This is
one of the deeper mathematical results that we cannot cover in this book. A
substantial background in Fourier analysis and functional analysis is needed
to properly understand this question. We refer to the books [4, 5] for readers
with the necessary mathematical background.
Now we will use the above results to define an important class of filters.
We say that a set of four filters with the perfect reconstruction property are
orthogonal filters, if the associated basis vectors constitute an orthonormal
basis. By the above results and definitions this means that we must have
7.7 Properties of Orthogonal Filters 79
fn = f n for all nEZ, or, translating this condition back to the filters using
(7.63) and (7.64), that
(7.66)
for all k E Z.
Finally, let us note that if we start with a biorthogonal basis pair with the
special structure imposed in (7.63) and (7.64), or with four filters satisfying
the four equations (7.56), (7.58), (7.60), and (7.62), then we can reverse
the arguments and show that the corresponding z-transforms Ho(z), H 1(z),
Go(z), and G1(z) satisfy (7.31) and (7.32). The details are left as Exer. 7.7.
7.6 Summary of Results on Lifting and Filters
We now briefly summarize the above results. We have four different ap-
proaches to a building block for the DWT. They are
1. The building block is defined using lifting steps, and a final normalization
step.
2. The building block is based on a 2 x 2 matrix (7.24), whose entries are
Laurent polynomials, and on the scheme described in Fig. 7.3. The matrix
is assumed to be invertible. This is the polyphase representation.
3. The building block is based on a two channel filter bank with the per-
fect reconstruction property, as in Fig. 7.4. The perfect reconstruction
property can be specified in two different manners.
a) The conditions (7.31) and (7.32) are imposed on the z-transforms of
the filters, together with the determinant condition (7.37).
b) The four filters satisfy the four equations (7.56), (7.58), (7.60), and
(7.62) in the time domain.
4. Four filters are specified in the time domain, the vectors fn, fn, are
defined as in (7.63) and (7.64), and it is required that they satisfy the
biorthogonality condition.
The results in the previous sections show that all four approaches are equiv-
alent, and we have also established formulas for translating between the dif-
ferent approaches. Part of the details were postponed to Chap. 12, namely
the only not-so-easy translation: From filters to lifting steps.
7.7 Properties of Orthogonal Filters
We will now state a result on properties of orthogonal filters. Let us note

that the orthogonality condition (7.66) and the equations (7.35) and (7.36)
imply that we can only specify one filter. The remaining three filters are then
determined by these conditions, up to the constant and odd time shift in
(7.35) and (7.36). We assume that the given filter is h o. To avoid the trivial
case we also assume that its length L is at least 2.
Proposition 7.7.1. Let h o, hI, go, and gi be four FIR jilters dejining a
two channel jilter bank with the perfect reconstruction property. Assume that
the jilters are orthogonal, i.e. they satisfy (7.66), and that they have real
coefficients. Then the following results hold.
1. The length of h o is even, L = 2K.
2. We can specify HI by
(7.67)
If the nonzero entries in h o are h o[I], ... ,ho[2K], then (7.67) shows that
the nonzero entries in hI are given by
hdk] = (-I)kh[2K + 1 - k], k = 1,2, ... ,2K . (7.68)
3. The power complementarity equation
holds for all w E R.

4. H o( -1) = 0 implies H o(l) = vI2.
5. Energy is conserved in the transition from X (z) to Yo (z), YI (z), see
Fig. 7.4. In the time domain this is the equation
We will now establish these five properties. Using the orthogonality condition
(7.66) and the result (7.56), we find after changing the summation variable
to - k the result
L ho[k]h o[2n + k] = 8[n] for all n E Z . (7.71)

k
Assume now that the length of h o is odd, L = 2N + 1. We can assume that

the filter has the form
h o = [... ,0, ho[O]' ... ,ho[2N], 0, ... ]
with ho[O] ::J 0 and ho[2N] ::J O. Using (7.71) with n = N ::J 0 (since we
have excluded the trivial case L = 1) we find ho[0]h o[2N] = 0, which is a
contradiction.
Concerning the second result, then the assumption go[k] = ho[-k] implies
Go(z) = HO(Z-I). Using (7.35) for -z we then find
HI(z) = _C-Iz-2k-lGO(_Z) = _C-IZ-2k-lHo(_Z-I).

7.7 Properties of Orthogonal Filters 81
We can choose c = 1 and k = K. Then (7.67) holds. Translating this formula

to the time domain yields (7.68).
To prove the third result we go back to (7.55) and insert Go(z) = HO(Z-l)
to get
(7.72)
If we now take z = eiw , and use that all coefficients are real, we see that the
third result follows from this equation and (7.67).
Taking z = 1 in (7.72) yields the fourth result, up to a global choice of
signs. We have chosen the plus sign.
The energy conservation property in (7.70) is much more complicated to
establish. We recommend the reader to skip the remainder of this section on
a first reading.
We use the orthogonality assumption to derive the equation
Ho(z)H 1(z-l) + H o(-z)H1(_Z-l) =0 (7.73)
from (7.59). If we combine (7.67) and (7.72), we also find
H1(z)H1(z-1) + H1(-z)H1(-Z-1) = 2. (7.74)
Taking z = eiw and using that the complex conjugate then is z-l, we get
from (7.72), (7.73), and (7.74) that for each wE R the matrix
iw _ 1 [Ho(e iW ) H o(e i (w+1r))]

U(e ) - .../2 H1(e iw ) H 1(e i (w+1r))
is a unitary matrix, since the three equations show that for a fixed w E R
the two rows have norm 1 (we include the 1/.../2 factor for this purpose) and
are orthogonal, as vectors in C 2 .
The next step is to note that this matrix allows us to write the equations
(7.28) and (7.29) as follows, again with z = eiw ,
iW
h X (e iw / 2) ]
Yo(e )] _ U( iW/2) [
[Y1(eiw) - e h X (ei (w/2+1r))'
Note how we have distributed the factor 1/2 in (7.28) and (7.29) over the two
terms. The unitarity of U(e iw / 2) for each w means that this transformation
preserves the norm in C 2 . Thus we have
for all wE R.
The last step consists in using Parseval's equation (7.2), then (7.75), and a
change in integration variable from w/2 to w, and finally Parseval's equation
once more.
IIYol1 2 + IIYll1 2 = ;7r1 21r

(IYo(ejWW + !Yl(ejwW) dw
4~ 1
21r
jW
= (IX(e / 2W + IX(e j (W/2+ 1r )W) dw
= 2~ 1 1r
j
(IX(ejWW + IX(e (W+1r)W) dw
=~
27r Jo
r21r IX(ejwWdw
= Ilx11 2 .
This computation concludes the proof of the proposition.
7.8 Some Examples
Let us now apply the above results to some of the previously considered trans-
forms. We start as usual with the Haar transform. It is given by (3.28)-(3.31).
In the notation introduced here (compare with the example in Sect. 7.2) we
get
Multiplying these matrices we get
H(z) = [_ ~ ~] .
V2V2
Using (7.38) and (7.39) we get
1
Ho(z) = J2(1 + z) ,
1
H1(z) = J2(-I+z) ,
and then in the frequency variable (on the unit circle in the complex plane)
IHo(ejW)1 = v'21 cos(wj2) I ,

IH1(ejW)1 = v'21 sin(wj2)I .
The graphs of these two functions have been plotted in Fig. 7.5. We see that
H o is a low pass filter, but not very sharp. Note that we have chosen a linear
scale on the vertical axis. Often a logarithmic scale is chosen in this type
7.8 Some Examples 83
1.5 r - - - - , - - - - , - - - . . . . - - - , - - - - - , - - - - , - - - ,
0.5
O~---L---'----'----JL....-----'--_~
o 0.5 1.5 2 2.5 3
Fig. 1.5. Plot of IHo(ejW)1 and IH1 (e jW )1 for the Haar transform
of plots. We will now make the same computations for the Daubechies 4
transform. It is given by lifting steps in (3.18)-(3.22). Written in matrix
notation we have
H(z) = ~
0 [ 0 ] [1
v'3"Jf 0 1
-z] [-:I} - 1 !%-2 Z -I
0] [1 v'3l
1 0 1 J.
Multiplying we find
Using the equations (7.38) and (7.39) we get the Daubechies 4 filters
Ho(z) = h[O] + h[l]z + h[2]Z2 + h[3]Z3 ,

HI (z) = -h[3]z-2 + h[2]z-1 - h[l] + h[O]z ,
where
h[O] = 1 + J3 h[l] = 3+J3

4V2 ' 4V2 ' (7.76)
h[2] = 3 - J3 h[3] = 1- J3 .
4V2 ' 4V2
Note that while obtaining the filter taps from the lifting steps is easy, since
it is just a matter of matrix multiplications, it is by no means trivial to go
the other way. This is described in details in Chap. 12.
The absolute values of Ho(e jW ) and H 1(e jW ) have been plotted in Fig. 7.6.
We note that these filters give a sharper cutoff than the Haar filters.
1.5 .-----,.------r--~----.----.----.,....,
0.5
_
____J'__
OL..-.oo~_l_ _ _ _ _ _ J ' - - _ - - - '_ _____I.
_ __ " " _ . J
o 0.5 1.5 2 2.5 3
Fig. 1.6. Plot of IHo(eiw)1 and IH1(eiw)1 for the Daubechies 4 transform
Finally we repeat these computations for the CDF(2,2) transform. It is given

in the lifting form by equations (3.13) and (3.14) and the normalization by
(3.40) and (3.41). We then get
H(z) = [~ ~] [~t(1~Z-l)] [-~(;+z)~].

Multiplying these terms yields
3=~~
1 -14V2z
1 -1 2V21+ :V21z -1] .
H(z) =
[2V2
2V2 2V2 z V2
As above we compute
1 2 1 31 1
Ho(z) = ---z + - z + - + - z -1 - --z- 2
4V2 2V2 2V2 2V2 4V2 '
1 2
H 1(z) = - - z + - z - -1 .
1
2V2 V2 2V2
7.8 Some Examples 85
1.6 r----r-~=-=--.-------,r---------r---~
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.5 1.5 2 2.5 3
Fig. 7.7. Plot of IHo(ejW)1 and IH1(ejW)1 for the CDF(2,2) transform
The graphs are plotted in Fig. 7.7. The unusual frequency response is due to
the fact that these filters are not orthogonal.
Let us finally comment on available filters. There is a family of filters
constructed by I. Daubechies, which we refer to as the Daubechies filters and
also the Daubechies transform, when we think of the associated DWT. The
Haar filters and the Daubechies 4 transform discussed above are the first two
members of the family. The filters are often indexed by their length, which
is an even integer. All filters in the family are orthogonal. As the length
increases they (slowly) converge to ideal low pass and high pass filters.
Another family consists of the so-called symlets. They were also first con-
structed by I. Daubechies. These filters are orthogonal and are also indexed
by their length. They are less asymmetrical than the first family, with regard
to phase response.
A third family of orthogonal filters with nice properties is the Coiflets.
They are described in [5], where one also finds tables of filter coefficients.
We have already encountered some families of biorthogonal filters. They
come from the CDF-families described in Sect. 3.6, and in the notation
CDF(N,M) the integers Nand M denote the multiplicity of the zero at
z = -1 for the z-transforms Ho(z) and Go(z), respectively. We say that
Ho(z) has a zero of multiplicity N at z = -1, if there exists a Laurent
polynomial Ho(z) with Ho( -1) ¥- 0, such that Ho(z) = (z + l)N Ho(z).
The various toolboxes have functions for generating filter coefficients. We
will discuss those in Uvi_ Wave in some detail later.
Exercises
7.1 Carry out explicitly the change of summation variable, which proves
(7.9) in the time domain.
7.2 Verify that the update step in the CDF(2,2) transform example in
Sect. 7.2 is given by S(z) = ~(1 + Z-I).
7.3 Let h be a filter. Show that filtering by h preserves the finite energy
property, i.e. Ilh * xii::; cllxll for all x E £2(Z), by using the z-transform and
Parseval's equation.
7.4 Verify that in the polyphase representation from Sect. 7.2 the invertibil-
ity of the matrix (7.24) is equivalent with the perfect reconstruction property
in the two channel filter bank case.
7.5 Verify that the canonical basis defined in (7.48) satisfies all the require-
ments for an orthonormal basis for £2 (Z).
7.6 Carry out the remaining three cases in the verification of (7.65).
7.7 Let four filters h o , hI, go, and gl satisfy the equations (7.56), (7.58),
(7.60), and (7.62). Show that their z-transforms satisfy (7.31) and (7.32).
7.8 Go through the details in establishing the formulas (7.51), (7.52), and
(7.53). Try also to obtain these formulas from the polyphase formulation of
the filter bank.
7.9 Verify that (7.67) leads to (7.68).
7.10 Show that (7.53) is valid if and only if
L (go[k - 2n]ho[2n - m] + gdk - 2n]h l [2n - m]) = 8[m - k] ,

n
and then show that this is true for all wavelet filters.
7.11 Carry out computations similar to those in Sect. 7.8 for the CDF(3,1)
transform defined in Sect. 3.6.
8. Wavelet Packets
In the first chapters we have introduced the lifting technique, which is a

method for defining and implementing the discrete wavelet transform. The
definitions were given in Chap. 3, and simple examples were given in Chap. 4.
In particular, we saw applications of the wavelet analysis to noise reduction.
The applications were based on the one scale building block, applied several
times. In this chapter we want to extend the use of these building blocks to
define many new transforms, all called wavelet packets. In many applications
these new transforms are the basis for the successful use of wavelets. Some
examples are given in Chap. 13.
8.1 From Wavelets to Wavelet Packets

In this chapter we regard the one scale DWT as a given building block and
we extend the concept of a wavelet analysis. Thus we regard the one scale
DWT as a black box, which can act on any signal of even length. The analysis
transform is now denoted by T a and the inverse synthesis transform by T s'
We recall that the direct transform is capable of transforming any signal
of even length into two signals, and the inverse transform reverses this de-
composition, i.e. we are requiring the perfect reconstruction property of our
building block.
Let the signal be represented by a box, with a length proportional to the
length of the signal. A transformation followed by an inverse transformation
will then be represented as in Fig. 8.1.
Fig. 8.1. Block diagram for building block and its inverse, with signal elements
shown

88 8. Wavelet Packets
Three consecutive transforms will look like Fig. 8.2(a). The dotted boxes show
the signal parts transferred without transformation from one application of
the building block to the next. Other examples are in Fig. 3.7, and in Table 2.1
and 3.1, which show the same with numbers and symbols. Note that the
orientation and location of the transformed components are different here
and in the examples mentioned.
Any collection of consecutive transforms of a given signal is called a de-
composition of this signal. We use the following terminology. The original
signal is said to be at the first level. Applying the transform once gets us to
the second level. Thus in the wavelet transform case the k scale transform
leads to a decomposition with k + 1 levels.
Note that the original signal always is the top level in a decomposition.
Depending on how we draw the diagrams, the top level is on the left, or on
the top of the diagram. Note that the placement of the s and d parts of each
transform step agrees with the diagram in Fig. 3.7.
1 2 3 4 1 2 3 4
I
I
I
I I
r ~
I I I
I I I
I I I
I I I
I I I
I I I
I I I
I I I
I I I
- " -"
(a) (b)
Fig. 8.2. The wavelet (a) and the full wavelet packet (b) decompositions of a signal
When looking at Fig. 8.2(a), one starts to wonder, why some signal parts
are transformed and other parts are not. One can apply the transform to all
parts, or, as we shall see soon, to selected parts. This idea is called wavelet
packets, and a full wavelet packet decomposition over four levels is shown in
Fig. 8.2(b). Note that in this case we have applied the transform T a 7 times.
The standard abbreviation WP will be used in the rest of the text. Each
signal part box in Fig. 8.2 is called an element. Note that this word might
also be used, when it is clear from the context, to denote a single number or
coefficient in a signal.
8.1 From Wavelets to Wavelet Packets 89
In Chap. 4 we saw the wavelet decomposition applied to problems of noise

reduction, and in the separation of slow and fast variations in a signal. But,
by increasing the number of possible decompositions, we may be able to do
better, in particular with signals, where the usual wavelet decomposition does
not perform well.
Let us go back to the very first example in Sect. 2.1. We saw that a
signal containing a total of 15 digits could be transformed into one containing
13 digits, and then in the next two steps to signals with 13 and 12 digits,
respectively. We can get from one of these four representations to any other,
by applying the direct or inverse transforms the right number of times. They
are equivalent representations of the original signal. So if we want to reduce
memory usage, we choose the one requiring the least amount of memory
space. Now if we use the wavelet packet decompositions, then we have 26
different, but equivalent, representations to choose from. Thus the chances of
getting an efficient representation will be greater.
Though the number 26 seems rather unmotivated, it has been carefully
deduced. But before elaborating on this we will see how a representation
can be extracted from the WP decomposition. As an example we use the
same signal as in the first chapters. The full WP decomposition is given in
Table 8.1. A representation is a choice of elements, sequentially concatenated,
Table 8.1. Full wavelet packet decomposition of the signal

56 40 8 24 48 48 40 16
48 16 48 28 8 -8 0 12
32 38 I 16 10 0 6 I 8 -6
35 I -3 I 13 I 3 3 I -3 I 1 I 7
such that
1. the selected elements cover the original signal, and
2. there is no overlap between the selected elements.
The first condition ensures sufficient information for reconstruction, while the
second one ensures that no unnecessary information is chosen. Both condi-
tions are needed. Since any representation is equivalent with a change of basis
(here we view the original signal as given in the canonical basis), a choice of
representation corresponds to a choice of basis (this is elaborated in Chap. 5).
We will, in agreement with most of the wavelet literature, often use the for-
mulation 'choice of basis.' But one should remember that a representation is
different from a basis.
An example of a representation (notice how the two conditions are ful-
filled) of the signal decomposed above is
I 48 16 48 28 ~ 8 -6 I·
The original signal is reconstructed by first using T s on

~ which becomes I 0 6 I·
Then T s is used again on
o 6 I 8 -6 I which becomes I 8 -8 0 12 I·
Finally the original signal is recreated by one more T s. Graphically it looks
like Fig. 8.3, where the T a and Ts-boxes have been left out.
Decomposition Reconstruction
56 40 8 24 48 48 40 16
56 40 8 24 48 48 40 16
Fig. 8.3. Decomposition of the signal, and reconstruction from one particular rep-
resentation.
To further exemplify the decomposition process, six other representations are

shown in Fig. 8.4. In Sect. 11.7 it is demonstrated how a WP decomposition
can be implemented in MATLAB.
8.2 Choice of Basis

With the extension of wavelets to wavelet packets the number of different
representations increases significantly. We claimed for example that the full
decomposition of the signal given in the first section gave a total of 26 possible
representations. As we shall see below, this number grows very rapidly, when
more levels are added to the decomposition.
We now assume that we have decided on a criterion for choosing the best
basis, a so-called cost function. It could be the number of digits used in the
representation of the signal. With 26 representations we can inspect each one
to find the best representation, but this becomes an overwhelming task with
one billion representations (which is possible with 7 levels instead of 4). It
is therefore imperative to find a method to assist us in choosing the best
basis. The method should preferably be both fast and exhaustive, such the
8.2 Choice of Basis 91
15640 8 244848 40 161 148 16 48 2818 -8 0 121
Fig. 8.4. Six different representation of the signal.
chosen basis is the best one, according to our criterion. If more than one basis
satisfies our criterion for best basis, the method should find one of them. We
present such an algorithm below, in Sect. 8.2.2.
8.2.1 Number of Bases
It is not very difficult to set up an equation for the number of basis in a

decomposition with a given number of levels. We start with a decomposition
containing j + 1 levels. The number of bases in this decomposition is denoted
by A j . This makes the previous claim equivalent to A 3 = 26. Now, a decom-
position with j + 2 levels has A j +! bases, and this number can be related
to A j by the following observation. The larger decomposition can be viewed
as consisting of three parts. Two of them are smaller decompositions with
j + 1 levels each. The third part is the original signal. See Fig. 8.5. Every
time we choose a basis in the left of the two small decomposition, there are
A j choices in the right, since we can combine left and right choices freely. A
total ofA; different bases can be chosen this way. For any choice of elements
in the two smaller decompositions, the original signal cannot be chosen, since
no overlap is allowed. If we do choose the top element, however, this also
.......................................
: _-- .
Fig. 8.5. Counting the number of decompositions
counts as a representation. Thus a decomposition with j + 2 levels satisfies

A j +! = Al + 1. Starting with j = 0, we find this to be a decomposition with
one level, so Ao = 1. Then Al = 12 + 1 = 2, and A 2 = 22 + 1 = 5, and then
A 3 = 52 + 1 = 26, the previous claim.
Continuing a little further, we are in for a surprise, see Table 8.2.
Table 8.2. Growth in the number of decompositions with level

Number of levels Minimum signal length Number of bases
1 1 1
2 2 2
3 4 5
4 8 26
5 16 677
6 32 458330
7 64 210066388901
8 128 44127887745906175987802
The number of bases grows extremely fast, approximately doubling the num-
ber of digits needed to represent it, for each extra level. It is worth noting
that the number of bases does not depend directly on the length of the signal.
A signal of length 128 transformed 3 times, to make a decomposition with
4 levels, has only 26 different representations. The number of levels, we can
consider, is limited by the length of the signal, since the decomposition must
terminate, when an element contains just one number. The minimum length
required for a given level is shown in the second column of Table 8.2, in the
first few cases. In general, decomposition into J levels requires a signal of
length at least 2J - I .
We can find an upper and a lower bound for Aj • Take two other equations
Bj+I = BJ, BI =2 and Cj +! = ci, CI = 2.
Clearly B j ~ Aj ~ Cj for j > O. Both B j and Cj are easily found. For B j
we have
An analogous computation shows that Cj = 22;. Hence we have the bounds
For a decomposition with 10 levels, Le. j = 9, we have a lower bound of

29
228 ~ 1077 and an upper bound of 2 ~ 10154 . These numbers are very
large.
8.2.2 Best Basis
The concept of the best basis depends on the application we have in mind.
To use the concept in an algorithm, we introduce a cost function. A cost
function measures in some terms the cost of a given representation, with the
idea that the best basis is the one having the smallest cost.
To be usable in an algorithm, the cost function must have some specific
properties. We denote a cost function by the symbol K here. The cost function
is defined on finite vectors of arbitrary length. The value of the cost function
is a real number. Given two vectors of finite length, a and b, we denote
their concatenation by [a bJ. This vector simply consists of the elements in a
followed by the elements in b. We require the following two properties.
1. The cost function is additive in the sense that K([a b]) = K(a) + K(b)
for all finite length vectors a and b.
2. K(O) = 0, where 0 denotes the zero vector.
As an example, we take the cost function, which counts the number of nonzero
entries in a vector. For example,
K([ 1 50 -3400 -6]) =5 .

The additivity is illustrated with this example
K([1 50 -3400 -6]) = K([15 0 -3]) + K([4 0 0 -6]) .

The conditions on a cost function can be relaxed in some cases, where the
structure of the signal is known, and a near-best basis is acceptable. We will
not pursue this topic in this book.
Let us now describe the algorithm, which for a given cost function finds
a best basis. The starting point is a computation of the full wavelet packet
decomposition to a prescribed level J, compatible with the length of the
signal. An example is shown in Fig. 8.2(b), with J = 4. The next step is
to calculate the cost values of all elements of the full decomposition. Note
that these two computations are performed only once. Their complexity is
proportional to the length of the given signal multiplied by the number of
levels chosen in the full decomposition.
Given this full decomposition with the cost of each element computed,
the algorithm performs a bottom-up search in this tree. It can be described
as follows.
1. Mark all elements on the bottom level J.
2. Let j = J.
3. Let k = O.
4. Compare the cost value V1 of element k on level j - 1 (counting from the
left on that level) to the sum V2 of the cost values of the elements 2k and
2k + 1 on level j.
a) If V1 ~ V2, all marks below element k on level j - 1 are deleted, and
element k is marked.
b) If V1 > V2, the cost value V1 of element k is replaced with V2.
5. k = k + 1. If there are more elements on level j (if k < 2j - 1 - 1), go to
step 4.
6. j = j - 1. If j > 1, go to step 3.
7. The marked basis has the lowest possible cost value, which is the value
currently assigned to the top element.
The additivity ensures that the algorithm quickly finds a best basis, which
of course need not be unique. Note that once the first two steps (full decom-
position and computation of cost) have been performed, then the complexity
of the remaining computations only depends on the number of levels J being
used. The complexity of this part is found to be O(Jlog J).
The algorithm is most easily understood with the help of an example.
For that purpose we reuse the decomposition given in Table 8.1. First the
cost value of each element is calculated. The cost values are represented in
the same tree structure as the full decomposition. As cost function in this
example we choose the count of numbers with absolute value> 1. Calculated
cost values and the marking of the bottom level are both shown in Fig. 8.7(1).
We start with the bottom level, since the search starts here. We then move
up each time it is possible to reduce total cost by doing so. The additivity
makes partial replacement of elements possible. The remaining crucial step is
the comparison of cost values. All comparisons are between an element and
the two elements just below it. If the sum of the cost values in the two lower
elements (V2 in the algorithm) is smaller than the cost value in the upper
element (V1 in the algorithm), this sum is inserted as a new cost value in the
upper element. This possibility is illustrated in the top row of Fig. 8.6. If,
on the other hand, the sum is larger than (or equal to) the cost value in the
upper element, this element is marked and all marks below this element are
deleted. This possibility is illustrated in the bottom row of Fig. 8.6.
Fig. 8.6. Comparison of cost values in the two cases
All elements on each level is run through, and the levels are taken from the
lowest but one and up. In Fig. 8.7 the process is shown in four steps. Notice
56 40 8 24 48 48 40 16
48 16 48 28 8 -8 0 12
32 38 I 16 10 0 6 I 8 -6
35 I -3 I 13 I 3 3 I -3 I 1 I 7
(1) 1-- 8.----_ _---1 (2) f-----....,.-------1
-
(3)
C f---r------4-..,--
c
(4) I--.-+--.-
Fig. 8.7. An example showing the best basis search. The cost function is the count
of numbers in each element with absolute value > 1
how the cost value in the top element is the cost value of the best basis at
the end of the search. The best representation of this signal has been found
to be
I 48 16 48 28 I 0 6 ITL:D.
In the left half of the decomposition the cost value is 4 in all of the 5 possible
choices of elements. This means that with respect to cost value 5 different
best bases exist in this decomposition. The algorithm always finds one of
these. The equal sign in step 4(a) has as a consequence that the best basis
with fewest transform steps is picked. Below we will use the term 'best basis'
for the one selected by this algorithm.
If we had chosen the cost function to be the count of numbers with ab-
solute value > 3, the best basis would have been the eight elements on the
bottom level, with a total cost value of only 3.
One particular type of best basis is the best level basis, where the chosen
basis consists solely of elements from one level. The number of best level
bases is equal to the number of levels. This type of basis is often used in
time-frequency planes.
In Sect. 11.8 it is shown how to implement the best basis algorithm in
MATLAB.
8.3 Cost Functions

Any function defined on all finite length vectors, and taking real values, can
be used as a cost function, if it satisfies the two conditions on page 93. Some of
the more useful cost functions are those which measure concentration in some
sense. Typically the concentration is measured in norm or entropy. Low cost,
and consequently high concentration, means in these cases that there are few
large and many small numbers in an element. We start with a simple example
of a cost function, which also can be thought of as measuring concentration.
8.3.1 Threshold
One of the simplest cost functions is the threshold, which simply returns
the count of numbers above a specified threshold. Usually the sign of each
number is of no interest, and the count is therefore of numbers with absolute
value above the threshold. This was the cost function used in the example in
the previous section.
In the context of a cost function given by a threshold, 'large' means above
the given threshold, and 'small' below the threshold. Low cost then means
that the basis represents the signal with as few large values as possible. In
many cases the large values are the significant values. This is often the case,
when one uses the wavelet transform.
But there are certain pitfalls in this argument. The following situation
can easily arise. A signal with values in the range -1 to 1 is transformed into
two signal each with values in the range -2 to 2. One more transform would
make the range -4 to 4, and so on. It is apparent from the plots in Fig. 7.5,
7.6, and 7.7 that the one level wavelet transform has a gain above 1. But this
does not mean that we gain information just by transforming the signal. The
increase in the range of values is due to the normalization of the one level
DWT, which in the orthogonal case means that we have energy preserved
during each step in the decomposition.
8.3 Cost Functions 97
Let us give an example. Take the vector a = [1 1 1 1 1 1 1 1]. Then Iiall = VS.
Let b denote its transform under an energy preserving transformation, such
that IIbll = VS. Assume now that the transform has doubled the range of the
signal. This means that at least one of the entries has to be ±2. Thus one
could find for example b = [22000000]. Since energy is preserved, at most
two samples can have absolute value equal to 2. So the overall effect is that
most of the entries actually decrease in amplitude. This effect explains, at
least partially, why the wavelet transform is so useful in signal compression,
and in other applications.
The threshold is a very simple cost function. To satisfy the additivity
property the threshold has to be the same for all levels. Furthermore, an
inappropriate choice of threshold value can lead to a futile basis search. If
the threshold is chosen too high, then all cost values will be zero, and the
basis search returns the original signal. The same is the case, if the threshold
is too low.
8.3.2 lV-Norm and Shannon's Entropy
The problem with the threshold cost function just mentioned leads one to
look for cost functions with better properties. In this section we describe
two possibilities. The first one is the so-called fP-norm. The second one is
a modified version of Shannon's entropy. Both have turned out to be very
useful in applications. The two cost functions are defined as follows.
Definition 8.3.1. For 0 < p < 00 the cost function based on the fP-norm is
given by
(8.1)
n
for all vectors a of finite length.

We see that the energy in the signal is measured by the cost function K e2.
If we use this cost function together with a one scale DWT, which preserves
energy, then we find that the best basis search algorithm always returns
the original representation. But for 0 < p < 2 this cost function, together
with an energy preserving transform, can be very useful. The reason is that
Ke2(a) = K e2(b) and Kep(a) < Kep(b) together imply that the vector a must
contain fewer large elements than b. See Exer. 8.4.
Definition 8.3.2. The cost function based on Shannon's entropy is defined
by
KShannon(a) =- L a[n]210g(a[n]2) (8.2)

n
for all vectors a of finite length. Here we use the convention 0 10g(0) = O.
Let us note that this cost function is not the entropy function, but a modified
one. The original entropy function is for a signal a computed as
- 2:p[n] log(p[n]), where p[n] = a[n]2/lIaIl2 .

n
This function fails to satisfy the additivity condition due to the division by
lIall. But the cost function defined here has the property that its value is
minimized, if and only the original entropy function is minimized.
Let us show that the cost function KShannon measures concentration in
a signal by an example. Let IN denote the vector of length N, all of whose
entries equal 1. Let SN = (E/N)1/2IN, Le. all entries equal (E/N)1/2. The
energy in this signal is equal to E, independent of N, while KShannon(SN) =
- E log( E / N). For a fixed E this function is essentially log( N). This shows
that the entropy increases, if we distribute the energy in the signal evenly
over an increasing number of entries. More generally, one can prove that for
signals with fixed energy, the entropy attains a maximum, when all entries
are equal, and a minimum, when all but one entry equal zero.
In Sect. 11.9 the implementation of the different cost functions is pre-
sented, and in Chap. 13 some examples of applications are given.
Exercises
8.1 Verify that the threshold cost function satisfies the two requirements for
a cost function.
8.2 Verify that the cost function KiP satisfies the two requirements for a cost
function.
8.3 Verify that the cost function KShannon satisfies the two requirements for
a cost function.
8.4 Let a = [a[O] all]] and b = [b[O] b[l]] be two nonzero vectors of length
2 with nonnegative entries. Assume that K i 2(a) = K i 2(b), but Kit(a) <
Kit (b). Assume that b[O] = b[l]. Show that either a[O] < all] or a[O] > all].
8.5 Assume that a full wavelet packet decomposition has been computed for
a given signal, using an energy preserving transform. Take as the cost function
K i 2. Go through the steps in the best basis search algorithm to verify that
the algorithm selects the original signal as the best basis.
8.6 Assume that a full wavelet packet decomposition has been computed
for a given signal. Assume that a threshold T is chosen, which is larger that
the largest absolute value of all elements in the decomposition. Choose the
threshold cost function with this threshold. Go through the steps in the best
basis search algorithm to verify that the algorithm selects the original signal
as the best basis.
9. The Time-Frequency Plane
Time-frequency analysis is an important tool in modern signal analysis. By

using information on the distribution of the energy in a signal with respect to
both time and frequency, one hopes to gain additional insight into the nature
of signals.
In this chapter we introduce the time-frequency plane as a tool for visual-
izing signals and their transforms. We look at the discrete wavelet transform,
wavelet packet based transforms, and also at the short time Fourier trans-
form. The connection between time and frequency is given by the Fourier
transform, which we introduced in Chap. 7. We will need further results from
Fourier analysis. They will be presented here briefly. What is needed can
be found in the standard texts on signal analysis, or in many mathematics
textbooks.
9.1 Sampling and Frequency Contents
We consider a discrete signal with finite energy, x E e2 (Z). The frequency

contents of this signal is given by the Fourier transform in the form of the
associated Fourier series
X(w) = Lx[n)e- jnw . (9.1)

n
See Chap. 7 for some results on Fourier series. The function X(w) is periodic
with period 27l', which means that X(w+27l'k) = X(w) for all k E Z. Therefore
the function is completely determined by its values on an interval of length
27l'. In this book we always take our signals to have real values. For a real
signal x we have X (w) = X (-w), as can be seen by taking the complex
conjugate of both sides in (9.1). As a consequence, the frequency contents is
determined by the values of X(w) on any interval of the form [hr, (k + 1)7l'],
where k can be any integer. Usually one chooses the interval [O,7l').
To interpret our signals we need to fix units. The discrete signal is indexed
by the integers. If we choose a time unit T, which we will measure in seconds,
then we can interpret the signal as one being measured at times nT, n E Z.
Let us assume that there is an underlying analog, or continuous, signal, such

100 9. The Time-Frequency Plane
that the discrete signal has been obtained by sampling this continuous signal
at times nT. The number 1IT is called the sampling rate, and Is = 271" IT the
sampling frequency. Note that some textbooks use Fourier series based on
the functions exp(-j271"nw). In those books the sampling frequency is often
defined to be liT.
H we now introduce the time unit explicitly in the Fourier series, then it
becomes
XT(W) = Lx[n]e- inTw . (9.2)

n
The function XT(W) is periodic with period 271" IT. After a change of variables,
Parseval's equation (7.2) reads
(9.3)
For a real discrete signal the frequency contents is then determined by the
values of XT(W) on for example the interval [0,71" IT]. This is often expressed
by saying that in a sampled signal one can only find frequencies up to half the
!
sampling frequency, Is, which is equal to 71" IT by our definition. This result
is part of Shannon's sampling theorem. See for example [5, 16, 22, 23, 28] for
a discussion of this theorem.
As mentioned above, for a real signal we can choose other intervals in
frequency, on which the values of XT(w) will determine the signal. Any in-
terval of the form [biT, (k + 1)7I"IT] can be chosen. This is not a viola-
tion of the sampling theorem, but simply a consequence of periodicity, and
the assumption that the signal is real. We have illustrated the possibilities
in Fig. 9.1. The usual choice is marked by the heavy line segment. Other
possibilities are marked by the thin line segments. Note how the symmetry
IXT(W)I = IXT(-w)1 is also shown in the figure.
IXT(w)!
311" W
T
Fig. 9.1. Choice of frequency interval for a sampled signal. Heavy line marks usual
choice
The frequency contents of an analog signal x(t) is given by the continuous

Fourier transform, which is defined as
i:
9.1 Sampling and Frequency Contents 101
x(w) = x(t)e-jwtdt . (9.4)
The inversion formula is
x(t) = -1 /00 x(w)eJwtdw.

.
27r -00 (9.5)
For a real signal we have i(w) = x(-w), such that it suffices to consider pos-
itive frequencies. Any positive frequency may occur. Suppose now we sample
an analog signal at a rate liT. This means that we take x[n] = x(nT). Re-
call that we use square brackets for discrete variables and round brackets for
continuous variables. The connection between the frequency contents of the
sampled signal and that of the continuous one is given by the equation
XT(W) = T1 "L.. x A (
w- 2k7r) .
T (9.6)
kEZ
This is a standard result from signal analysis, and we refer to the litera-
ture for the proof, see for example [16, 22, 23, 28]. The result (9.6) shows
that the frequencies outside the interval [-7r IT, 7r IT] in the analog signal are
translated into this interval. This is the aliasing effect of sampling.
We see from (9.6) that the frequency contents of the sampled and the
analog real signal will agree, if the nonzero frequencies are in the interval
[-7r IT, 7r IT]. If the nonzero frequencies of the analog signal lie in another
interval of length 27r IT, then we can assign this interval as the frequency
interval of the sampled signal.
The aliasing effect is illustrated in Fig. 9.2. It is also known from everyday
life, for example when the wheels of a car turn slowly on a film, although the
car is traveling at high speed.
Fig. 9.2. A 7Hz and a 1 Hz signal sampled 8 times per second yield the same
(sampled) signal
9.2 Definition of the Time-Frequency Plane
We use a time-frequency plane to describe how the energy in a signal is

distributed with respect to the time and frequency variables. We start with
a discrete real signal x E f2(Z), with time unit T. We choose [O,1l'jT] as the
frequency interval. We mark the sample times on the horizontal axis and the
frequency interval on the vertical axis. The sample x[n] contributes Ix[nW
to the energy, and we place the value in a box located as shown in Fig. 9.3.
Alternatively, we fix a grey scale and color the boxes according to the energy
contents. This gives a visual representation of the energy!distribution in the
signal (see Fig. 9.9).
11'
T ..-----,---.------.--...,
o
OT IT 2T 3T 4T
Fig. 9.3. Time-frequency plane for a discrete signal
Suppose now that we down sample the signal x by two. Then the sampling
rate is Ij2T, and we choose [0, 1l' j2T] as the frequency interval. Since we have
fixed the units, we get the visual representation shown in Fig. 9.4.
11'
2T 1------.------...,
OL.-. -L- ---'
OT IT 2T 3T 4T
Fig. 9.4. Time-frequency plane for a discrete signal. The signal from Fig. 9.3, down
sampled by two
We will now define the time-frequency planes used to visualize the DWT.
Let us go back to the first example in Chap. 2. We had eight samples, which
we transformed three times using the Haar transform. We first use symbols,
and then the numbers from the example. The original signal is represented
by eight vertical boxes, like the four vertical boxes in Fig. 9.3. We will take
T = 1 to simplify the following figures. The first application of the transform
is in symbols given as
83 [0],83[1], 83 [2], 83[3], 83[4],83[5], 83[6], 83[7]

-+ 82[0],82[1], 82[2], 82[3], d2 [0], d2 [1]' d2 [2]' d2 [3] .
Each of the down sampled components, 82 and d 2, can be visualized as in
Fig. 9.4, but not in the same figure, since they both occupy the lower half of
the time-frequency plane.
The problem is solved by looking at one step of the DWT in the frequency
representation, as described in Sect. 7.3 in the form of a two channel filter
bank. We have illustrated the process in Fig. 9.5. The original signal is shown
on the left hand side in the second row. Its Fourier transform is shown in
the top graph, together with the transfer functions of the two filters. The
bottom parts can be obtained in two different ways. In the time domain we
use convolution with the filters followed by down sampling by two. In the
frequency domain we take the product of the Fourier transform of the signal,
X(w), and the two transfer functions H(w) and G(w), and then take the
inverse Fourier transform of each product, followed by down sampling by
two. All of this is shown in the right part of the figure.
Let us now return to the example with eight samples. The original signal
contains frequencies in the interval [0,11"] (recall that we have chosen T = 1).
For the 82 part we can choose the interval [0,11"/2] and for the d 2 the interval
[11"/2,11"]. If h is an ideal low pass filter and g an ideal high pass filter, this
gives the correct frequency contents of the two signals. But for real world
filters this is only approximately correct. With this choice, and we emphasize
that it is a choice, we get the visualization of the transformed signal shown in
Fig. 9.6. Usually we use a grey scale to represent the relative values instead
of the actual values.
The next step is to apply the DWT to the signal 82, to obtain 81 and
d l , each of length two. We assign the frequency interval [0,11"/4] to 81, and
[11"/4,11"/2] to d l . Thus the two step DWT is visualized as in Fig. 9.7
In the final step 81 is transformed to 80 and do, each of length one. This
third step in the DWT is visualized in Fig. 9.8.
We now illustrate this decomposition with the numbers from Chap. 2. All
the information has been gathered in Fig. 9.9. The boxes have been labeled
with the coefficients, not their squares, to make it easier to recognize the
coefficients. The squares have been used in coloring the boxes.
In Chap. 8 we generalized the wavelet analysis to the wavelet packet anal-
ysis. We applied the DWT repeatedly to all elements, down to a given level
J, to get a full wavelet packet decomposition to this level, see Fig. 8.2(b) for
FT of signal and of filters
, ,
, ,
,, ,,,
,, ,
, ,
Original signal Product of FT of signal and filters
DWT 1FT and 2..l.
Low pass part of DWT High pass part of DWT
Fig. 9.5. The wavelet transform using the filters h and g in the frequency domain
J = 4. Based on this decomposition, a very large number of different repre-

sentations of a signal could be obtained. We can visualize these bases using
the same idea as for the wavelet decomposition. Each time we apply the DWT
to a signal, we should assign frequency intervals as above, assigning the lower
interval to the low pass filtered part, and the upper interval to the high pass
filtered part. The process leads to partitions of the time-frequency plane, one
partition for each possible wavelet packet decomposition. Each partition is a
way of visualizing the effect (on any signal) in time and frequency of a chosen
Id2[O]l2 Id2[1W Id2[2]12 Id2[3W
IS 2[o]l2 IS 2[1)1 2 I S 2[2W IS 2[3W

o
o 1 2 3 4 5 6 7 8
Fig. 9.6. One step DWT applied to eight samples, with T = 1. Visualization of
energy distribution
Id2[O)1 2 Id2[1)1 2 Id2[2W Id2[3W
IdI [O]l2 IdI[l)1 2
lSI [OW IsI[lW

o
o 1 2 3 4 5 6 7 8
Fig. 9.7. Two step DWT applied to eight samples, with T = 1. Visualization of
energy distribution
Id2[OW Id2[1W Id2[2W Id2[3W

11:
"2
IdI [OW IdI[l]l2
Ido OW
o Iso Oll~
o 1 2 3 4 5 6 7 8
Fig. 9.8. Three step DWT applied to eight samples, with T = 1. Visualization of
energy distribution
(1) 56 40 8 24 48 48 40 16
(2) 48 16 48 28 8 -8 0 12
(3) 32 38 I 16 10 8 -8 0 12
(4) 35 1-3116 10 8 -8 0 12
(1) 8 -8 0 12 (2)
6 8 fl8 'i0
II
'i 0 4 8 6
48 16 48 28
(3) 8 -8 0 12 8 -8 0 12 (4)
16 10 16 10
32 38 -3
35
Fig. 9.9. Each level in a wavelet decomposition corresponds to a time-frequency

plane. Values from Chap. 2 are used in this example. Boxes are marked with coef-
ficients, not with their squares
representation. For a given signal the boxes can then be colored according to
energy contents in each box. This way it becomes easy visually to identify a
representation, where the transformed signal has few large coefficients.
The partition of the time-frequency plane associated with a chosen wavelet
packet representation can be constructed as above. A different way of obtain-
ing it is to turn the decomposition 90 degrees counterclockwise. The partition
then becomes easy to identify. An example with a signal of length 32, decom-
posed over J = 4 levels, is shown in Fig. 9.10.
In Fig. 9.11 we have shown four possible representations for a signal with
8 samples, and the associated partitions of the time-frequency plane.
One more thing concerning the time-frequency planes should be men-
tioned. The linear scale for energy used up to now is often not the relevant
one in applications. One should choose an appropriate grey scale for coloring
the cells. Often one uses the logarithm of the energy. Thus the appropriate
measure is often 20Iog10 (lsj[kJl) (units decibel (dB)), displayed using a linear
grey scale. In the following figures we have used a logarithmic scale, slightly
modified, to avoid that coefficients with values close to zero lead to a com-
pression of the linear grey scale used to display these values. Thus we use
-
--
--f-----l_
Fig. 9.10. A time-frequency plane is easily constructed by turning the decompo-

sition. The length of each element shows the height of each row of cells, and the
number of coefficients in each element the number of cells in each row. The numbers
in the boxes are the lengths of the elements
Fig. 9.11. Four different choices of bases and the corresponding time-frequency
planes. Note that the figure is valid only for signals of length 8, since there are 8
cells
log(l + ISj[kW) to determine our coloring of the time-frequency planes in the

sequel, in combination with a linear grey scale.
9.3 Wavelet Packets and Frequency Contents
We will now take a closer look at the frequency contents in the time-
frequency visualization of a wavelet packet analysis, for example those
shown in Fig. 9.11. The periodicity of XT(W) and the symmetry property
IXT(W)I = IXT( -w)1 together imply that when we take for example the fre-
quency interval [1l'/T, 21l'/T] to determine XT(w), then the frequency contents
r 0 2 4 6 8 fu~
~" I, Iv
o 2 468Hz 0 2 468Hz
~~~~
cf P cf P0 (0) 2 (2) 4 (4) 0 (8) 2 (6) 4 (4)
~~~l2
o (0) 2 (2) 4 (4) 0 (0) 2 (2) 4 (4) 0 (8) 2 (6) 4 (4) 0 (8) 2 (6) 4 (4)
o (0) 2 (2) 0 (4) 2 (2) 0 (8) 2 (6) 0 (4) 2 (6)

Fig. 9.12. Due to the down sampling all the high pass parts are mirrored. This
swaps the low and high pass part in a subsequent transform. The result is that
the frequency order of the four signals at the bottom level is 1, 2, 4, and 3. The
numbers in parentheses show the origin of parts of the signal. The differences in
line thickness is a help to trace the in signal parts. Note that the figure is based on
ideal filters
in this interval is the mirror image of the contents in [O,71"/T], see Fig. 9.l.
Thus we have to be careful in interpreting the frequency contents in a wavelet
packet analysis. In the first step from Sj to Sj-l, d j - 1 we have assigned the
interval [0, 7r j2T] to Sj-l and the interval [7r j2T, 7r jT] to d j - 1 , in our con-
struction of the time-frequency plane. In the wavelet analysis we only decom-
pose Sj-l in the next step. Here the frequency interval is the one we expect
when applying low pass and high pass filters. In the wavelet packet analy-
sis we apply the filters also to the part d j - 1 . When we apply filters to this
part, the frequency interval has to be [0, 7r j2T]. But the frequency contents in
this interval is the mirror image of the contents in [7r j2T, 7r jT], which means
that the low frequency and high frequency parts are reversed. Two steps in
a wavelet packet decomposition are shown in Fig. 9.12.
It is important to understand the frequency ordering of the elements in a
wavelet packet analysis. A more extensive example is given in Fig. 9.13. We
°
have taken a 128 Hz signal, which has been sampled, such that the frequency
range is from Hz to 64 Hz. Three steps of the full wavelet packet decom-
position are shown, such that the figure shows a four level decomposition.
The elements at the bottom level have a frequency ordering 0, 1, 3, 2, 6,
I 0-64 I
°Hz 64 Hz
~ /G
I 0-32 0-32 II 32 - 64 32 - 0 I
°Hz 32 Hz 64 Hz 32 Hz
~ /G ~ /G
0-16 16 - 32 32 -16 16 - 0
o Hz
0-16 II
16 Hz 32 Hz
16 - 0 II
16 Hz 48 Hz
0-16 II
64 Hz 48 Hz
16 - 0
32 Hz
~ /G ~ /G ~ /G ~ /G
16 16
~
0-8
0
18 - 1 116 - 81
8-0
8 16
0-8 ~
8-0 ~
0-8 18 - 1 116 - 81
8-0
8 24 32 24 16 48 56 64 56 40 48 40 32
0-8 ~
8-0
0 1 3 2 6 7 5 4
000 001 011 010 110 111 101 100
Fig. 9.13. decomposition of a signal following the same principle as in Fig. 9.12.
The numbers in the cells shows the real frequency content before (in parentheses)
and after down sampling, while the numbers below the cells shows from which
frequency band the signal part originates. The last two lines show the frequency
order, in decimal and binary notation, respectively
7, 5, 4. We write these numbers in binary notation, using three digits: 000,

001, all, 010, 110, 111, 101, 100. Then we note a special property of this
sequence. Exactly one binary digit changes when we go from one number to
the next. It is a special permutation of the numbers a to 2N - 1. Such a
sequence is said to be Gray code permuted. The Gray code permutation is
formally defined as follows. Given an integer n, write it in binary notation
as nN+1 nNnN -1 ... n2n1, such that each ni is either a or 1. Note that we
have added a leading zero. It is convenient in the definition to follow. For
example, for n = 6 we have n1 = 0, n2 = 1, n3 = 1, and n4 = O. The Gray
code permuted integer GC(n) is then defined via its binary representation.
The i'th binary digit is denoted by GC(n)i, and is given by the formula
(9.7)
The inverse can be found, again in binary representation, via the following
formula
IGC(n)i =L nk mod 2. (9.8)

k~i
The sum is actually finite, since for an integer n < 2N we have nk = a for all
k>N.
With these definitions we see that we get from the frequency ordering in
Fig. 9.13 to the natural (monotonically increasing) frequency order by using
the IGC map.
Once we have seen how the permutation arises in our scheme for finding
the wavelet packet decomposition, we can devise the following simple change
to the scheme to ensure that the elements appear in the natural frequency
order. Above we saw that after one step of the DWT, due to the down
sampling, the frequency contents in the high pass part appeared in reverse
order (see Fig. 9.12). Thus in the next application of the DWT to this part,
the low and high frequency parts appear in reverse order, see again Fig. 9.12.
This means that to get the elements in the natural frequency order, we have
to interchange the position of the low and high frequency filtered parts in
every other application of the DWT step. This method is demonstrated in
Fig. 9.14. Look carefully at the figure, notice where the Hand G filters are
applied, and compare in detail with Fig. 9.13.
Thus we have two possible ways to order the frequency contents in the
elements of a full wavelet packet decomposition of a given signal. Using the
original scheme we get the ordering as shown in Fig. 9.13. It is called filter
bank ordering. The other ordering is called natural frequency ordering. Some-
times it is important to get the frequency ordering right. In particular, if one
wants to interpret the time-frequency plane, then this is important. In other
cases, for example in applications to denoising and compression, the ordering
is of no importance.
0-64
~ /G
0-32 II 64- 32
~ /G /G ~
0-16 II 32 -16 II 32 -48 II 64-48
~/G/G~~/G/G~
~ 116-81 ~ ~4-321 ~ ~8-401 ~6-481 ~
o 1 2 3 4 5 6 7
Fig. 9.14. We get the natural frequency order by swapping every other application
of the DWT. Compare this figure with Fig. 9.13
Let us illustrate the consequences of the choice of frequency ordering in the

time-frequency plane. We take the signal which is obtained from sampling
the function sin(1281rt2 ) in 1024 points in the time interval [0,2]. This signal
is called a linear chirp, since the instantaneous frequency grows linearly with
time. As the DWT in this example we take the Daubechies 4 transform. Some
of the possible bases in a wavelet packet decomposition are those that we call
level bases, meaning that we choose all elements in the basis from one fixed
level. With a level basis with J = 6 we then get the two time-frequency planes
shown in Fig. 9.15. Each plane consists of 32 x 32 cells, colored with a linear
grey scale map according to the values of log(l + ISj[kW).
9.4 More about Time-Frequency Planes

We will now discuss in some further detail the time-frequency planes. It is
clear from Fig. 9.15 that some improvements can be made. We will divide
the discussion into the following topics.
• frequency localization
• time localization
• Alignment
• Choice of basis
Each topic is discussed in a separate subsection.
9.4.1 Frequency Localization
Before we explain the effects in Fig. 9.15 we look at a simpler example. Let us
first explain why we choose a level basis in this figure. It is due to the uneven
Time Time
Fig. 9.15. Visualization of significance of frequency ordering of the elements in a

decomposition. The left hand plot uses the filter bank ordering, whereas the right
hand plot uses the natural frequency order
time-frequency localization properties in a wavelet basis (low frequencies are

well localized, and high frequencies are poorly localized), see for example
Fig. 9.8, or the fourth example in Fig. 9.11. With a level basis the time-
frequency plane is divided into rectangles of equal size. We illustrate this
difference with the following example. As the DWT we take the one based
on Daubechies 4. As the signal we take
sin(wot) + sin(2wot) + sin(3wot) , (9.9)
sampled at 1024 points in the time interval [0,1]. We have taken Wo =

405.5419 to get frequencies that cannot be localized in just one frequency
interval, in the partition into 32 intervals in our level basis case. In Fig. 9.16
the frequency plane for the wavelet transform is shown on the left. On the
right is a level basis decomposition. In both cases we have decomposed to
J = 6. For the level basis we then get 32 x 32 boxes.
It is evident from this figure that the level basis is much better at localizing
frequencies in a signal. So in the remainder of this section we use only a level
basis.
Let us look again at the right hand part of Fig. 9.15. The signal is obtained
by sampling a linear chirp. Thus we expect the frequency contents to grow
linearly with time. This is indeed the main impression one gets from the
figure. But there are also reflections of this linear dependence. The reflections
appear at frequencies 15/4, 15/8, ... ,where Is is the sampling frequency. This
is a visualization of the reflection in frequency contents due to down sampling,
as discussed in the previous section. It is due to our use of real filters instead
of ideal filters.
-
1)'
i ~1i"~I;;.I.I~"""I;;"PI.
l:
u..
Time Time
Fig. 9.16. A signal with three frequencies in the time-frequency plane. Left hand
plot shows the time-frequency plane for the wavelet decomposition, and the right
hand plot the level basis from a wavelet packet decomposition, both with J =
6frequency localization
We therefore start by looking at the frequency response of some wavelet

filters. In Sect. 7.3 we showed the frequency response for the filters for the
three transforms, which we call Haar (also called Daubechies 2), in Fig. 7.5,
Daubechies 4, in Fig. 7.6, and CDF(2,2), in Fig. 7.7. Note that the scale on
the vertical axis is linear in these three figures. It is more common to use a
logarithmic scale. Let us show a number of plots using a logarithmic scale.
In Fig. 9.17 the left hand part shows the frequency response of the filters
Daubechies 2, 12, and 22. The right hand part shows the frequency response
of CDF(4,6). All figures show that the filters are far from ideal. By increasing
the length of the Daubechies filters one can get closer to ideal filters. In the
limit they become ideal.
In Fig. 9.18 we have repeated the plot from the right hand part of Fig. 9.15,
which was based on Daubechies 4, and then also plotted the same time-
frequency plane, but now computed using Daubechies 12 as the DWT. It is
evident that the sharper filter gives rise to less prominent reflections in the
time-frequency plane.
The right hand plot in Fig. 9.17 shows another problem that may occur.
The frequency response of the low pass and the high pass parts of CDF(4,6)
is not symmetric around the normalized frequency 0.5, Le. the frequency
divided by 1r. In repeated applications of the DWT this asymmetry leads to
problems with the interpretation of the frequency contents. We will illustrate
this with an example. Suppose that we apply the CDF(4,6) transforms three
times to get to the fourth level in a full wavelet packet decomposition. Thus
we have eight elements on the fourth level, as shown in Fig. 8.2(b). We can
Daubechies 2, 12, and 22 CDF(4,6)
~ ~
..... ) , .... , '< / \

.......
1 I ' ,i \.'
\
; i
2
: , 10-2L-··_·····_······--'-:/~ --'-\-----'
0.25 0.5 0.75 o 0.25 0.5 0.75
Normalized frequency Normalized frequency
Fig. 9.11. Frequency response for Daubechies 2, 12 (dashed), and 22 (dashed-

dotted), and for the biorthogonal CDF(4,6)
Time Time
Fig. 9.18. The left hand part is the time-frequency plane for the linear chirp from
Fig. 9.15, which is based on Daubechies 4. The same plot, based on Daubechies 12,
is shown in the right hand part
find the frequency response of the corresponding eight bandpass filters. They
are plotted in Fig. 9.19
Let us also illustrate the reflection of the line in a linear chirp due to
undersampling. If we increase the frequency range beyond the maximum given
by the sampling frequency, then we get a reflection as shown in Fig. 9.20.
This figure is based on Daubechies 12 filters.
9.4.2 Time Localization
It is easy to understand the time localization properties of the DWT step,

using the filter bank approach. We only consider FIR filters. In the time
3r---;::--------,-"..------------------,
2.5
1.5
0.5
0.125 0.25 0.375 0.5 0.625 0.75 0.875
;l~
o A I;[]r ;[].
o O;~
2
300·,'"
1
o
I
.
. !
! i
: :[].iill 3D:·,,:"
~ b.
o 0.5 1 o 0.5 1 0 0.5 1 o 0.5 1
3 r---"..-"..------.
2 2 ;
1 ,
1 '.
o 0: ; o
0.5 o 0.5 0 0.5 1 o 0.5 1
Fig. 9.19. The eight bandpass filters corresponding to the fourth level in a decom-
position based on CDF(4,6). The top part shows the eight plots together, and the
bottom part shows the individual responses. The ideal filter response is shaded on
each of these figures
domain the filter acts by convolution. Suppose that h = [h[l], h[2], ... ,h[NJ]
is a filter of length N. Then filtering the signal x yields
N
(h * x)[n] =L h[k]x[n - k] . (9.10)
k=l
Thus the computed value at n only depends on the preceding samples x[n-
1], ... ,x[n-N].
Let us illustrate the time localization properties of the wavelet decomposi-
tion and the best level decomposition, both with J = 6, Le. five applications
of the DWT. We use Daubechies 4 again, and take the following signal of
length 1024.
25 if n = 300 ,
x[n] = 1 if 500 ~ n ~ 700 , (9.11)
{ 15 if n = 900,
o otherwise .
Time
Fig. 9.20. This time-frequency plane shows the effect of undersampling a linear
chirp. Above the sampling rate the frequency contents is reflected into the lower
range. This plot uses Daubechies 12 as the DWT
The two time-frequency planes are shown in Fig. 9.21. We notice that the
wavelet transform is very good at localizing the singularities and the constant
part. The wavelet packet best level representation has much less resolution
with respect to time. The filter is so short (length 4) that the effects of the
filter length is not very strong. We see the expected broadening in the wavelet
transform of the contribution from the singularity, due to the repeated ap-
plications of the filters. You should compare this figure with Fig. 4.6.
Let us give one more illustration. This time we take the sum of the signals
used in Fig. 9.16 and Fig. 9.21. The two plots are shown in Fig. 9.22. Here we
can see that the wavelet packet best basis representation gives a reasonable
compromise between resolution in time and in frequency.
9.4.3 Alignment
In Fig. 9.23 we have plotted the impulse response (filter coefficients) of the
filters Daubechies 24 and Coifiet 24 (in the literature also called coif4), both
of length 24. The first column shows the IR of the low pass filters, and
the second column those of the high pass filters. We see that only a few
coefficients dominate. The Coifiet is much closer to being symmetric, which
is significant in applications, since it is better at preserving time localization.
Let us explain this is some detail.
It is evident from Fig. 9.23 that the large coefficients in the filters can be
located far from the middle of the filter. If we recall from Chap. 7 that with
Time Time
Fig. 9.21. The left hand plot shows the time-frequency plane for the signal in
(9.11), decomposed using the wavelet transform, and the right hand plot the same
signal in a level basis decomposition, both with J = 6
Time Time
Fig. 9.22. The left hand plot shows the time-frequency plane for the signal in
(9.11) plus the one in (9.9), decomposed using the wavelet transform, and the right
hand plot the same signal in a level basis decomposition, both with J = 6
0.5 0.5
-0.5
5 10 15 20 5 10 15 20
0.5
-0.5 -0.5
5 10 15 20 5 10 15 20
Fig. 9.23. The first row shows the IR of the Daubechies 24 filters. The second row
shows the same plots for Coiflet 24. The left hand column shows the IR of the low
pass filters, the right hand one those of the high pass filters
orthogonal filters the high pass filter is obtained from the low pass filter by
reflection and alternation of the signs, see (7.68), then the center of the high
pass filter will be in the opposite end. This is also clear from Fig. 9.23.
For a filter h = [h[I], h[2], . .. ,h[N)) we can define its center in several
different ways. For a real number x we let LxJ denote the largest integer less
than or equal to x.
Maxima location The center is defined to be the first occurrence of the
absolute maximum of the filter coefficients. Formally this is defined by
Cmax(h) = min{n Ilh[nJi = max{lh[k] II k = 1, ... ,N}} . (9.12)
Mass center The mass center is defined by
C
mass
(h) =
L::=lklh[k] Ij.
lL::-l Ih[k]1
(9.13)
Energy center The energy center is defined by
C
energy
(h) -lL::-l
- L::=lk1h[kWj
Ih[kJl2 .
(9.14)
As an example the values for the filters in Fig. 9.23 are shown in Table 9.1.
Suppose now that the signal x, which is being filtered by convolution by
hand g, only has a few large entries. Then these will be shifted in location by
Table 9.1. Centers of Daubechies 24 and Coiflet 24

Filter C max Cmass Cenergy
Daub24 h 21 19 20
Daub24 g 4 5 4
Coif24 h 16 15 15
Coif24 g 9 9 9
the filtering, and the shifts will differ in the low and high pass filtered parts.
In the full wavelet packet decomposition this leads to serious misalignment
of the various parts. Thus shifts have to be introduced. As an example, we
have again used the linear chirp, and the Daubechies 12 filters. The left hand
part of Fig. 9.24 shows the time-frequency plane based on the unaligned level
decomposition, whereas the right hand part shows the effect on alignment
based on shifts computed using Cmax '
Alignment based on the three methods for computing centers given above
is implemented in Uvi_ Wave. As can be guessed from Fig. 9.24, we have
used alignment in the other time-frequency plane plots given in this chapter.
We have used Cmax to compute the alignments. The various possibilities are
selected using the function wtmethod in UvL Wave.
~ ~
c: c:
Q) Q)
:::> :::>
0- 0-
l!! l!!
LL LL
Time Time
Fig. 9.24. The left hand part shows the time-frequency plane for the linear chirp
in Fig. 9.18 without alignment corrections. The right hand part is repeated from
this figure
9.4.4 Choice of Basis
It is evident from both Fig. 9.21 and Fig. 9.22 that the choice of basis de-
termines what kind of representation one gets in the time-frequency plane.
There are basically two possibilities. One can decide from the beginning that
one is to use a particular type of bases, for example a level basis, and then
plot time-frequency planes for a signal based on this choice. As the above
examples show this can be a good choice. In other cases the signal may be
completely unknown, and then the best choice may be to use a particular
cost function and find the best basis relative to this cost function. The time-
frequency plane is then based on this particular basis. As an example we have
taken the signal used in Fig. 9.22 and found the best basis using Shannon
entropy as the cost function. The resulting time-frequency plane is shown in
Fig. 9.25. One should compare the three time-frequency planes in Fig. 9.22
and Fig. 9.25. It is not evident which is the best representation to determine
the time-frequency contents of this very simple signal.
The simple examples given here show that to investigate the time-
frequency contents of a given signal may require the plot of many time-
frequency planes. In the best basis algorithm one may have to try several
different cost functions, or for example different values of the parameter p in
the fV-norm cost function.
Time
Fig. 9.25. The time-frequency plane for the signal from Fig. 9.22, in the best basis
determined using Shannon entropy as the cost function
9.5 More Fourier Analysis. The Spectrogram
We now present a different way to visualize the distribution of the energy with
respect to time and frequency. It is based on the short time Fourier transform,
and in practical implementations, on the discrete Fourier transform. The
resulting visualization is based on the spectrogram. To define it we need
some preparation.
Given a real signal x E £2(Z), and a sampling rate liT, we have visual-
ized the energy distribution as in Fig. 9.3. Here we have maximal resolution
with respect to time, since we take each sample individually. But we have no
frequency information beyond the size of the frequency interval, which is de-
termined by the sampling rate. On the other hand, if we use all samples, then
we can compute XT(W), which gives detailed information on the distribution
of the energy with respect to frequency. One can interpret i;;.IXT(wW as the
energy density, as can be seen from Parseval's equation (9.3). The energy in
a given frequency interval is obtained by integrating this density over that
interval.
We would like to make a compromise between the two approaches. This
is done in the short time Fourier transform. One chooses a window vector
w = {w[n]}nEZ' which is a sequence with the property that 0 ~ w[n] ~ 1
for all n E Z. Usually one chooses a window with only a finite number of
nonzero entries. In Fig. 9.26 we have shown four typical choices, each with
16 nonzero entries.
0.5 0.5
0'--------'-' 0'-----------'
o 5 10 15 o 5 10 15
0.5 0.5
o '---""""--~--- . . . . . -'
o 5 10 15
Fig. 9.26. Four window vectors of length 16. Top row shows rectangular and
triangular windows. Bottom row shows the Hanning window on the left and a
Gaussian window on the right
Once a window vector is chosen, a short time Fourier transform of a signal x

is computed as
XSTFT(k,w) =L w[n - k]x[n]e- inTw . (9.15)

nEZ
The window is moved to position k and then one computes the Fourier se-
ries of the sequence w[n - k]x[n], which is localized to this window. This is
repeated for values of k suitably spaced. Suppose that the length N of the
window is even. Then one usually chooses k = mN/2, mE Z. For N odd one
can take k = meN - 1)/2. Thus ones slides the window over the signal and
looks at the frequency contents in each window.
The function IXSTFT(k,wW is called a spectrogram. It shows the energy
density (or power) distribution in the signal, based on the choice of window
vector.
Take as the window vector the constant vector
w[n] = 1 for all n E Z ,
then for all k we have XSTFT(k,w) = XT(W), the usual Fourier series. On
the other hand, if one chooses the shortest possible rectangular window,
w[n] = {1 for n = 0 ,
o for n¥-O ,
then one finds XSTFT(k,w) = x[k]e- ikTw , such that the time-frequency plane
in Fig. 9.3 gives a visualization of the spectrogram for this choice of window.
Concerning the window vectors in Fig. 9.26, then the rectangular one of
length N is given by w[k] = 1 for k = 1, ... ,N, and w[k] = 0 otherwise. The
triangular window is defined for N odd by
2n/(N + 1), 1 :::; n :::; (N + 1)/2,

w[n] = { 2(N _ n + 1)/(N + 1), (N+1)/2:::;n:::;N,
and for N even by
w[n] = {(2n -1)/N, 1:::; n:::; N/2,

2(N -n+ 1)/N, N/2:::; n:::; N.
All values w[n] not defined by these equations are zero. The Hanning (or
Hann) window of length N is defined by
w[n] = sin2 (1r(n -1)/N), n = 1, ... ,N .

This window is often used for the short time Fourier transform. The last
window in Fig. 9.26 is a Gaussian window, defined by
w[n] = exp( -a(2n - (N + 1))2), n = 1, ... ,N ,
for a positive parameter a. In our example a = 0.03. Note that the four
window vectors can be obtained by sampling a box function, a hat function,
sin2(1Tt), and exp( -at2 ), respectively.
The above results can be applied to a finite signal, but for such signals
one usually chooses a different approach, based on the discrete Fourier trans-
form, here abbreviated as DFT. Since the Fourier expansion involves complex
numbers, it is natural to start with complex signals, although we very soon
restrict ourselves to real ones. A finite signal of length N will be indexed
by n = 0, ... ,N - 1. All finite signals of length N constitute the vector
space eN, of dimension N. We define
ek[n] = ei27rkn/N, n,k = 0, ... ,N -1. (9.16)
The vectors ek are orthogonal with respect to the usual inner product on eN,
see (7.44). Thus these vectors form a basis for the space of finite signals eN.
The DFT is then expansion of signals with respect to this basis. We use the
notation x for the DFT of x. The coefficients are given by
N-l
x[k] =L x[n]e-i27rkn/N . (9.17)
n=O
The inversion formula is
N-l
x[n] =~ L x[k]ei27rkn/N . (9.18)
k=O
Parseval's equation is for the DFT the equation
N-l
IIxl1 2
= ~ L Ix[kW . (9.19)
k=O
Let us note that the computations in (9.17) and (9.18) have fast implemen-
tations, known as the fast Fourier transform.
Comparing the definition of the DFT (9.17) with the definition of the
Fourier series (9.1), we see that
x[k] = X(27TkIN) .
Thus the DFT can be viewed as a sampled version of the Fourier series, with
the sample points 0, 21TIN, ... , 21T(N - 1)IN.
For a real x of length N we have
N-l N-l
irk] =L x[n]e i2 11"nk/N =L x[n]e- i21rn (N-k)/N = x[N - k] , (9.20)
n=O n=O
where we introduce the convention that x[N) = X[O). This is consistent with
(9.17). This formula allows us to define x[k) for any integer. It is then periodic
with period N.
We see from (9.20)that we need only half of the DFT coefficients to re-
construct x, when the signal is real.
Now the spectrogram used in signal processing, and implemented in the
signal processing toolbox for MATLAB as specgram, is based on the DFT. A
window vector w is chosen, and then one computes IXSTFT(k, 21rn/NW /21r
for values of k determined by the length of the window vector and for n =
0,1, ... ,N - 1. The spectrogram is visualized as a time-frequency plane,
where the cells are chosen as follows. Along the time axis the number of cells
is determined by the length of the signal N, the length of the window vector,
and the amount of overlap one wishes to use. The number of cells in the
direction of the frequency axis is determined by the length of the window
(the default in specgram). Assume the length of the window is L, and as
usual that the signal is real. If L is even, then there will be (L/2) + 1 cells
on the frequency axis, and if L is odd, there will be (L + 1)/2 cells. If the
sampling rate is know, then it is used to determine the units on the frequency
axis.
Let us give two spectrograms of the signal used in Fig. 9.22, and also
in Fig. 9.25. They are obtained using the specgram function from the signal
processing toolbox for MATLAB. In the plot on the left hand side of Fig. 9.27
a Hanning window of length 256 is used. In the right hand plot the window
length is 64. In both plots we have used a color map which emphasizes the
large values. Larger values are darker.
The trade-off between time and frequency localization is clearly evident in
these two figures. One should also compare this figure with Fig. 9.22 and with
Fig. 9.25. Together they exemplify both the possibilities and the complexity
in the use of time-frequency planes to analyze a signal.
9.5.1 An Example of Fourier and Wavelet Time-Frequency

Analysis
To wrap up this chapter we will give a more complex example of an applica-

tion of the best basis algorithm. We have taken a signal consisting of several
different signals added, and made two wavelet and two Fourier time-frequency
analyses of the signal.
In Fig. 9.28 the signal is shown in the four different time-frequency planes.
The first two are the traditional spectrograms based on the STFT. Here we
must choose between long and short oscillations. With a short window we see
the short oscillations clearly, but long oscillations becomes inaccurate with
respect to frequency, while the long window tends to smear the short oscilla-
tion and enhance the long oscillations. Using a wavelet time-frequency plane
with equally dimensions cells (level basis) does improve the time-frequency
plane, primarily due to the very short filter, that is length 12 compare to the
Exercises 125
Time Time
Fig. 9.27. The left hand part shows the time-frequency plane for the signal from
Fig. 9.22 obtained as a spectrogram with a Hanning window of length 256. In the
right hand part the window length is 64. The signal has 1024 samples
shortest window of length 64 in the STFT. But when using the best basis
algorithm to pick a basis, in this case based on Shannon's entropy, the time-
frequency plane is much improved. The different sized cells makes it possible
to target the long, slow oscillations in the lower half and the fast and short
oscillations in the upper half.
Exercises
After you have read Chap. 13, you should return to these exercises.
9.1 Start by running the basis script in Uvi_ Wave. Then use the Uvi_ Wave
functions tfplot and tree to display the basis tree graph and the time-
frequency plane tilings for a number of different basis.
9.2 Get the function for coloring the time-frequency tilings from the Web
site of this book (see Chap. 1) and reproduce the figures in this chapter.
9.3 Do further computer experiments with the time-frequency plane and
synthetic signals.
9.4 If you have access to the signal processing toolbox, do some experiments
with the specgram function to understand this type of time-frequency plane.
Start with simple signals with known frequency contents, and vary the win-
dow type and length.
Spectrogram, 1024 point FFT, windows 64, overlap 16
Spectrogram, 1024 point FFT, windows 512, overlap 400
Scalogram with level basis, Symlet 12
Scalogram with best basis, Symlet 12, and the entropy cost function
~ 'C ~ .. ~ I~~~ ~ ~
,.-.= ~~
•. . ~~"Z',;;.;;,.;;;;
~·,,"",,,,':"'.cz:,,,.;;;"":,-~~·'·'·'-'
Time
Fig. 9.28. Four different time-frequency planes of the same signal. The signal is a
composite test signal consisting of one slow and fourteen very fast chirps, a fixed
frequency lasting the first half of the signal, and a periodic sinus-burst. The window
used in the Fourier analyses is a Hanning
10. Finite Signals
In the previous chapters we have only briefly, and in a casual way, consid-
ered the problems arising from having a finite signal. In the case of the Haar
transform there are no problems, since it transforms a signal of even length
to two parts, each of half the original length. In the case of infinite length
signals there are obviously no problems either. But in other cases we may
need for instance sample s[-I], and our given signal starts with sample s[O].
We will consider solutions to this problem, which we call the boundary prob-
lem. Theoretical aspects are considered in this chapter. It is important to
understand that there is no universal solution to the boundary problem. The
preferred solution depends on the kind of application one has in mind. The
implementations are discussed in Chap. 11. The reader mainly interested in
implementations and applications can skip ahead to this chapter.
Note that in this chapter we use a number of results from linear algebra.
Standard texts contain the results needed. Note also that in this chapter we
use both row vectors and column vectors, and that the distinction between
the two is important. The default is column vectors, so we will repeatedly
state when a vector is a row vector to avoid confusion. Some of the results in
this chapter are established only in an example, and the interested reader is
referred to the literature for the general case.
There is a change in notation in this chapter. Up to now we have used the
notation h o, hI, for the analysis filter pair in the filter bank version of the
DWT. From this chapter onwards we change to another common notation,
namely h, g (except for Chap. 12 which is closely connected to Chap. 7).
This is done partly to simplify the matrix notation below, partly because
the literature on the boundary problem typically uses this notation. We also
recall that we only consider filters with real coefficients.
10.1 The Extent of the Boundary Problem
To examine the extent of the problem with finite signals we use the lifting
steps for the Haar transform and Daubechies 4. We recall the definition of
the Haar transform

128 10. Finite Signals
d(l)[n] = S[2n + 1] - S[2n] , (10.1)

1
s(1)[n] = S[2n] - -d(l)[n] , (10.2)
2
and of the Daubechies 4 transform
s(1)[n] = S[2n] + V3S[2n + 1] , (10.3)

1 1
d(l)[n] = S[2n + 1] - 4V3s(1)[n] - 4(V3 - 2)s(1)[n - 1] , (lOA)
s(2)[n] = s(l)[n] - d(l)[n + 1] . (10.5)

Compared to previous equations we have omitted the index j, and the original
signal is now denoted by S. When calculating s(1)[n] and d(1)[n] for the Haar
transform we need the samples S[2n] and S[2n + 1] for each value of n, while
for s(1) [n] and d(1) [n] in the case of Daubechies 4 we need S[2n - 2], S[2n -1]'
S[2n], and S[2n + 1]. The latter is seen by inserting (10.3) into (lOA). For
a signal of length 8 the parameter n assumes the values 0, 1, 2, and 3. To
perform the Haar transform samples S[O] through S[7] are needed, while the
Daubechies 4 transforms requires samples S[-2] through S[7], Le. the eight
known samples and two unknown samples. Longer transforms may need even
more unknown samples. This is the boundary problem associated with the
wavelet transform.
There exists a number of different solutions to this problem. Common to
those we consider is the preservation of the perfect reconstruction property of
the wavelet transform. We will explore the three most often used ones, which
are boundary filters, periodization, and mirroring. Moreover, we will briefly
discuss a more subtle method based on preservation of vanishing moments.
10.1.1 Zero Padding, the Simple Solution
We start with a simple and obvious solution to the problem, which turns
out to be rather unattractive. Given a finite signal, we add zeroes before and
after the given coefficients to get a signal of infinite length. This is called zero
padding. In practice this means that when the computation of a coefficient
in the transform requires a sample beyond the range of the given samples in
the finite signal, we use the value zero.
If we take a signal with 8 samples, and apply zero padding, then we see
that in the Haar transform case we can get up to 4 nonzero entries in s(1) and
in d(1). Going through the steps in the Daubechies 4 transform we see that
in s(l) the entries with indices 0,1,2,3 can be nonzero, whereas in d(1) the
entries with indices 0,1,2,3,4 can be nonzero, and in S(2) those with indices
-1,0,1,2,3 can be nonzero. Thus in the two components in the transform
we may end up with a total of 10 nonzero samples.
This is perhaps unexpected, since up to now we have ignored this phe-
nomenon. Previously we have stated that the transform of a signal of even
10.1 The Extent of the Boundary Problem 129
length leads to two components, each of half the length of the input signal.
This is correct here, since we have added zeroes, so both the original signal
and the two transformed parts have infinite length. For finite signal the state-
ment is only correct, when one uses the Haar transform, or when one applies
the right boundary correction. Thus when we use zero padding, the number
of nonzero entries will in general increase each time we apply the DWT step.
It is important to note that all 10 coefficients above are needed to re-
construct the original signal, so we cannot just leave out two of them, if the
perfect reconstruction property is to be preserved. In general the number of
extra coefficients is proportional to the filter length. For orthogonal trans-
forms (such as those in the Daubechies family) the number of extra signal
coefficients is exactly L - 2, with L being the filter length. See p. 135 for the
proof.
When we use zero padding, the growth in the number of nonzero entries is
unavoidable. It is not a problem in the theory, but certainly in applications.
Suppose we have a signal of length N and a filter of length L, and suppose
we want to compute the DWT over k scales, where k is compatible with the
length of the signal, Le. N 2: 2k • Each application of the DWT adds L - 2
new nonzero coefficients, in general. Thus the final length of the transformed
signal can be up to N + k(L - 2).
The result of using zero padding is illustrated as in Fig. 10.1. As the
filter taps "slides" across the signal a number of low and high pass transform
coefficients are produced, a pair for each position of the filter. Since there are
(N + L)/2 - 1 different positions, the total number of transform coefficients
is twice this number, that is N + L - 2.
If one considers wavelet packet decompositions, then the problem is much
worse. Suppose one computes the full wavelet packet decomposition down to
a level J, Le. we apply the DWT building block J -1 times, each time to all
elements in the previous level. Starting with a signal of length N and a filter
of length L, then at the level J the total length of the transformed signal can
be up to N + (2 J - 1 - 1)(L - 2). This exponential growth in J makes zero
padding an unattractive solution to the boundary problem.
Thus it is preferable to have available boundary correction methods, such
that application of the corrected DWT to a signal leads to two components,
each of half the length of the original signal. Furthermore we would like
to preserve the perfect reconstruction property. We present four different
methods below. The first three methods use a number of results from lin-
ear algebra. The fourth method requires extensive knowledge of the classical
wavelet theory and some harmonic analysis. It will only be presented briefly
and incompletely.
The reader interested in implementation can go directly to Chap. 11.
The methods are presented here using the filter bank formulation of the
DWT step. Another solution to the boundary problem based directly on the
lifting technique is given in Sect. 11.4.2 with CDF(4,6) as an example. The
Zeros Original signal (length N) Zeros
--
2 taps Length L
Transformed signal (length N + L - 2)
Fig. 10.1. The result of zero padding when transforming a finite signal. The grey
boxes illustrates the positions of the filter taps as the filtering occurs. Each position
gives a low pass and high pass coefficient. The number of positions determines the
number of transform coefficients. In this figure we have omitted most of the 'interior'
filters to simplify it
connection between filter banks and lifting is discussed in Chap. 7 and more
thoroughly in Chap. 12.
10.2 DWT in Matrix Form

In the previous chapters we have seen the DWT as a transform realized in
a sequence of lifting steps. We have also described how the DWT can be
performed as a low and high pass filtering, followed by down sampling by
2. Now we turn our attention to the third possibility, which was presented
in Chap. 5 using the Haar transform as an example. The transform can
be carried out by multiplying the signal with an appropriate matrix. The
reconstruction can also be done by a single multiplication. We assume that
the input signal, denoted by x, is of even length.
We recall from (7.51) that the low pass filtered and down sampled signal
is given as
(Hx)[n] = L h[2n - k]x[k]. (10.6)

k
This convolution is interpreted as an inner product between the vector

[···0 h[L-1] ... h[l] h[O] 0 ... ]

and the vector x, or as the matrix product of the reversed filter row vector
and the signal column vector. The high pass part Gx is found analogously,
see (7.52). The symbols Hand G emphasize that we consider the transition
from x to Hx and Gx as a linear maps.
Thus we decompose x into Hx and Gx, and we have to decide how to
combine these two components into a single vector, to get the matrix form
of the transform. There are two obvious possibilities. One is to take all the
components in Hx, followed by all components in Gx. This is not an easy
solution to use, when one considers infinite signals. The other possibility is
to interlace the components in a column vector as
y = [... (Hx)[-l] (Gx)[-l] (Hx)[O] (Gx)[O] (Hx)[l] (Gx)[l] ... ]T .
Since the four vectors in an orthogonal filter set have equal even length (in
contrast to most biorthogonal filter sets), it is easier to describe the matrix
form of the DWT for orthogonal filters. Later on it is fairly easy to extend
the matrix form to biorthogonal filters.
It follows from (10.6) that the matrix of the direct transform has the
following structure. The rows of the matrix consist of alternating, reversed
low and high pass IRs, each low pass IR is shifted two places in relations to
the preceding high pass JR, while the following high pass IR is not shifted.
The low pass filter is now denoted by h and the high pass filter by g, in
contrast to Chap. 7, where we used the notation ho and hI, respectively.
If the length of the filter is 6, then the matrix becomes
". h[5] h[4] h[3] h[2]

h[l] h[O] 0 0 0 0
g[5] g[4] g[3] g[2]
g[l] g[O] 0 0 0 0
o 0 h[5] h[4]
h[3] h[2] h[l] h[O] 0 0
Ta = 0 0 g[5] g[4]
g[3] g[2] g[l] g[O] 0 0 (10.7)
o 0 0 0 h[5] h[4] h[3] h[2] h[l] h[O]
o 0 0 0 g[5] g[4] g[3] g[2] g[l] g[O] ".
Given an infinite signal x as a column vector the wavelet transform can be

calculated simply by y = Tax. Obviously we want to be able to reconstruct
the original signal in the same manner, so we need another matrix such that
(10.8)
By multiplying T s and y we get x. For finite matrices the equation (10.8)
implies that T s = T;I, and for infinite matrices we impose this condition.
Fortunately, it is easy to show that T;I = T~ for orthogonal filters (see
Exer. 10.2). Recall that a real matrix with this property is called orthogonal.
Now we have
(Hx) [0]
(Gx)[O]
(Hx)[l]
(Gx)[l] ,
(Hx)[2]
(Gx) [2]
so in order to reconstruct the original signal the matrix T s is applied to a

mix of low and high pass coefficients.
The major difference in the case of biorthogonal filters is that T a is not
orthogonal, and hence T s cannot be found simply by transposing the direct
transform matrix. To understand how T s is constructed in this case, we first
examine T s in the orthogonal case. It is easy to show that
·'. h[O] g[0] 0 0 0 0

· '. h[l] g[l] 0 0 0 0
·'. h[2] g[2] h[O] g[0] 0 0
·'. h[3] g[3] h[l] g[1] 0 0
T s -TT-
- a-
·'. h[4] g[4] h[2] g[2] h[O] g[O] (10.9)
h[5] g[5] h[3] g[3] h[l] g[l] ".
o 0 h[4] g[4] h[2] g[2] ".
o 0 h[5] g[5] h[3] g[3] ".
o 0 0 0 h[4] g[4] ".
o 0 0 0 h[5] g[5] ".
for a length 6 orthogonal filter. Compared to Chap. 7 we have changed the

notation, such that the synthesis filter pair is now denoted by ii, g. In Chap. 7
the pair was denoted by go, gl. The verification of the structure shown in
(10.9) is left as an exercise.
In the same way we can write T s for biorthogonal filters, except with the
obvious difference that we do not have the close connection between anal-
ysis and synthesis that characterized the orthogonal filters. We will instead
give an example showing how to determine T s in the biorthogonal case. The
biorthogonal filter pair CDF(2,4) is given by
h
V2 [3 -6 -16
= -128 38 90 38 -16 -6 3]
'
V2
g = - [1 -2 1]'
4
and from (7.35) and (7.36) it follows that
h
-V2
= 4 [1 2 1],
g = ~ [3 6 -16 -38 90 -38 -16 6 3].
The analysis matrix becomes
". h[6] h[5] h[4] h[3] h[2] h[l] h[O] 0 0 0 0 0

... 0 0 g[2] g[l] g[O] 0 0 0 0 0 0 0
h[8] h[7] h[6] h[5] h[4] h[3] h[2] h[l] h[O] 0 0 0
Ta = o 0 0 0 g[2] g[l] g[O] 0 0 0 0 0
o 0 h[8] h[7] h[6] h[5] h[4] h[3] h[2] h[l] h[O] 0
o 0 0 0 0 0 g[2] g[l] g[O] 0 0 0
o 0 0 0 h[8] h[7] h[6] h[5] h[4] h[3] h[2] h[l] ".
o 0 0 0 0 0 0 0 g[2] g[l] g[O] 0 .
and, just as it was the case for orthogonal filters, the synthesis matrix consists
of the synthesis IR in forward order in the columns of T s , such that
. '. h[O] g[2] 0 g[O] 0 0 0 0 0 0

". h[l] g[3] 0 g[l] 0 0 0 0 0 0
. '. h[2] g[4] h[O]
g[2] 0 g[0] 0 0 0 0
o g[5] h[l] g[3] 0 g[l] 0 0 0 0
o g[6] h[2] g[4] h[O] g[2] 0 g[O] 0 0
Ts = T;;:l = 0 g[7] 0 g[5] h[l] g[3] 0 g[l] 0 0
o g[8] 0 g[6] h[2] g[4] h[O] g[2] 0 g[0]
o 0 0 g[7] 0 g[5] h[l] g[3] 0 g[1]'"
o 0 0 g[8] 0 g[6] h[2] g[4] h[O] g[2] ".
o 0 0 0 0 g[7] 0 g[5] h[1]g[3] ".
o 0 0 0 0 g[8] 0 g[6] h[2] g[4] ".
Note that the alignment of hand g must match the alignment of hand g
in T a' We have now constructed two matrices, which perform the orthogonal
and biorthogonal wavelet transforms, when multiplied with the signal.
We have introduced the matrices of the direct and inverse transforms in

order to explain how we construct boundary corrections. Computationally
both filtering and lifting are much more efficient transform implementations.
10.3 Gram-Schmidt Boundary Filters

The idea behind boundary filters is to replace the filters (or lifting steps) in
each end of the signal with some new filter coefficients designed to preserve
both the length of the signal and the perfect reconstruction property. This
idea is depicted in Fig. 10.2. We start by looking more carefully at the problem
Original signal (length N)

I
2 taps
+----+
length L
Transformed signal (length N)
Fig. 10.2. The idea behind all types of boundary filter is to replace the filters
reaching beyond the signal (see Fig. 10.1) with new, shorter filters (light grey).
By having the right number of boundary filters it is possible to get exactly the
same number of transform coefficients as signal samples while preserving certain
properties of the wavelet transform
with zero padding. Suppose we have a finite signal x of length N. We first

perform the zero padding, creating the new signal s of infinite length, by
defining
{~[nl
if n ~ -1,
sIn) = if n = 0,1, ... , N - 1 , (10.10)
if n '? N .
°
Suppose that the filter has length L, with the nonzero coefficients having
indices between and L - 1. To avoid special cases we also assume that N
is substantially larger than L, and that both Land N are even. We then
examine the formula (see (7.51))
N-l
(Hs)[n] =L h[2n - k]s[k] =L h[2n - k]x[k]
kEZ k=O
for each possible value of n. If n < 0, the sum is always zero. The first
nonzero term can occur when n = 0, and we have (Hs)[O] = h[O]x[O]. The
last nonzero term occurs for n = (N +L-2)/2, and it is (Hs)[(N +L-2)/2] =
h[L-1]x[N -1]. The same computation is valid for the Gs vector. Thus in the
transformed signal the total number of nonzero terms can be up to N + L - 2.
This computation also shows that in the index range L/2 < n < N -(L/2)
all filter coefficients are multiplied with x-entries. At the start and the end
only some filter coefficients are needed, the others being multiplied by zero
from the zero padding of the signal s. This leads to the introduction of the
boundary filters. We modify the filters during the L/2 evaluations at both
the beginning and the end of the signal, taking into account only those filter
coefficients that are actually needed. Thus to adjust the h filter a total of L
new filters will be needed. The same number of modifications will be needed
for the high pass filter. It turns out that we can manage with fewer modified
filters, if we shift the location of the finite signal one unit.
Let us repeat the computation above with the following modification of
the zero padding. We define
o if n:::; -2,
SShift[n]= x[n+1] ifn=-1,0,1, ... ,N-2, (10.11)
{
° ifn~N-1.
With this modification we find that the first nonzero term in H s can be
(HSshift)[O] = h[1]x[0] + h[0]x[1] ,

and the last nonzero term can be
(HSShift)[(N + L)/2 - 2] = h[L - 1]x[N - 2] + h[L - 2]x[N - 1] ,

due to the assumption that L is even. With this shift we need a total of L - 2
corrections at each end. We will use this shifted placement of the nonzero
coefficients in the next subsection.
10.3.1 The DWT Matrix Applied to Finite Signals
Instead of using zero padding we could truncate the matrices T a and T s, by

removing the parts multiplying the zero padded parts of the signal. Although
this gives finite matrices it does not solve the problem that the transformed
signal can have more nonzero entries than the original signal. The next step is
therefore to alter the truncated matrices to get orthogonal matrices. We treat
only orthogonal filters, since the biorthogonal case is rather complicated.
Let us start with the example from the previous section. For a filter of
length 6 and a signal of length 8 the transformed signal can have 12 non-
vanishing elements, as was shown above. Let us remove the part of the matrix
that multiplies zeroes in Sshift. The reduced matrix is denoted by T~, and it
is given as
h[1] h[O] 0 0 0 0 0 0 y[O]
9[1] 9[0] 0 0 0 0 0 0 y[1]
h[3] h[2] h[1] h[O] 0 0 0 0 x[O] y[2]
9[3] 9[2] 9[1] 9[0] 0 0 0 0 x[1] y[3]
h[5] h[4] h[3] h[2] h[1] h[O] 0 0 x[2] y[4]
9[5] 9[4] 9[3] 9[2] 9[1] 9[0] 0 0 x [3] y[5]
T~x= (10.12)
0 o h[5] h[4] h[3] h[2] h[1] h[O] x [4] y[6]
0 o 9[5] 9[4] 9[3] 9[2] 9[1] 9[0] x[5] y[7]
0 0 0 o h[5] h[4] h[3] h[2] x[6] y[8]
0 0 0 o 9[5] 9[4] 9[3] 9[2] x[7] y[9]
0 0 0 0 0 o h[5] h[4] y[lO]
0 0 0 0 0 o 9[5] 9[4] y[l1]
It is evident from the two computations above with the original and the
shifted signal that the truncation of the T a matrix is not unique. As de-
scribed above, we have chosen to align the first non-vanishing element in x
with h[1] and 9[1]. This makes T~ "more symmetric" than if we had chosen
h[O] and 9[0]. Moreover, choosing the symmetric truncation guarantees linear
independence of the rows, see [11]' a property which we will need later. By
truncating T a to make the 12 x 8-matrix T~ we have not erased any infor-
mation in the transformed signal. Hence it is possible, by reducing T s to an
8 x 12 matrix, to reconstruct the original signal (see Exer. 10.6).
Now we want to change the matrix T~ such that y has the same number
of coefficients as x. When looking at the matrix equation (10.12) the first idea
might be to further reduce the size of T~, this time making an 8 x 8 matrix,
by removing the two upper and lower most rows. The resulting matrix is
denoted by T~. At least this will ensure a transformed signal with only 8
coefficients. By removing the two first and two last columns in T~ we get an
8 x 8 synthesis matrix. The question is now whether we can reconstruct x
from y or not. As before, if we can prove T~T~ = I, perfect reconstruction
is guaranteed. Although it is easily shown that we cannot obtain this (see
Exer. 10.7), the truncation procedure is still useful. For it turns out that the
matrices T~ and T~ have a very nice property which, assisted by a slight
adjustment of the matrices, will lead to perfect reconstruction. Moreover this
adjustment also ensures energy preservation, which is one of the properties
of orthogonal filters that we want to preserve in the modified matrix.
We start by examining the truncated matrix T~, which we now denote by

M, rewritten to consist of 8 row vectors
h[3] h[2] h[l] h[O] 0 00 0 mo
g[3] g[2] g[l] g[O] 0 00 0 ml
h[5] h[4] h[3] h[2] h[l] h[O] 0 0 m2
g[5] g[4] g[3] g[2] g[l] g[O] 0 0
Tila = 0 o h[5] h[4] h[3] h[2] h[l] h[O]
=M=
ms
(10.13)
II4
0 o g[5] g[4] g[3] g[2] g[l] g[O] ms
0 0 0 o h[5] h[4] h[3] h[2] IIl6
0 0 0 o g[5] g[4] g[3] g[2] m7
where m n is the n'th row vector (note that the mk vectors throughout this
section are row vectors). As a consequence of (7.62), (7.66), and (7.71) most
of the vectors in M are mutually orthogonal. Moreover, the eight vectors
are linearly independent (see Exer. 10.4 for this particular matrix, and the
paper [11] for a general proof). This means that all the rows can be made mu-
tually orthogonal by the Gram-Schmidt orthogonalization procedure, which
in turn means that we can transform M to get an orthogonal matrix. With
an orthogonal matrix we can find the inverse as the transpose. Thus we have
also found the synthesis matrix.
Let us recall the Gram-Schmidt orthogonalization procedure. We first
recall that the inner product of two row vectors U and v can be written as
the matrix product uv T. Given a set of mutually orthogonal row vectors
uo, ... , UN, and a row vector v we get a vector orthogonal to the Un by
taking
N T
V
I
= V - f:'o IIu l1
~UnV
n
2 Un .
(10.14)
If v is in the subspace spanned by the U vectors, v' will be the zero vector.
It is easy to verify that v'is orthogonal to all the vectors Un, n = 0, ... ,N.
Thus the set of vectors no, UI, ... ,UN,V ' consists of N + 2 mutually orthog-
onal vectors. In this manner any set of linearly independent vectors can be
transformed to a set of mutually orthogonal vectors, which span the same
subspace as the original set. This is the Gram-Schmidt orthogonalization
procedure. It desired, the new vectors can be normalized to have norm one,
to get an orthonormal set.
We want all the rows in M to be mutually orthogonal. Since m2 through ms
already are orthogonal (they have not been truncated), we need only orthog-
onalize mo, ml, m6, and m7 with respect to the remaining vectors. We start
by orthogonalizing mo with respect to m2 through ms,
(10.15)
followed by orthogonalization of ml with respect to ~ and m2 through m5.
(10.16)
We continue with m7 and m6. Note that they are orthogonal to mo and mb
since the nonzero entries do not overlap. Thus if we compute
(10.17)
(10.18)
then these vectors are also orthogonal to ~ and m~. Actually the number
of computations can be reduced, see Exer. 10.8.
We now replace the first two rows in M with ~ and m~, respectively,
and similarly with the last two rows. The rows in the new matrix are or-
thogonal. We then normalize them to get a new orthogonal matrix. Since
m2 through m5 already have norm 1, we need only normalize the four new
vectors.
m~
/I
n = 0,1,6,7.
mn = Ilm~II'
The result is that we have transformed the matrix M to the orthogonal matrix
M'=
The new synthesis matrix is obtained as the transposed matrix
Note that the changes in the analysis matrix are only performed at the first
two and last two rows. The two new top rows are called the left boundary
filters and those at the bottom the right boundary filters.
If we need to transform a longer finite signal of even length, we can just
add the necessary pairs of hand g in the middle, since these rows are or-
thogonal to the four new vectors at the top and bottom. Let us verify this
claim. Let us look at for example m~ from (10.16). The vectors roo and ml
are orthogonal to the new rows in the middle, since they have no nonzero
entries overlapping with the entries in the new middle rows, see (10.13). The
remaining vectors in the sums defining m~ are combinations of the vectors
m2, ... ,m5, which have not been truncated, and therefore are orthogonal to
the new middle rows. The orthogonality of the three remaining vectors to the
new middle rows is obtained by similar arguments.
10.3.2 The General Case
The derivation of boundary filters for a length 6 IR makes it easy to gener-

alize the method. For any wavelet filter it is always possible to truncate the
corresponding analysis matrix T a , such that the result is an N x N matrix
M (with N even) with all but the first and last L/2 - 1 rows containing
whole IRs, and such that the upper and lower truncated rows have an equal
number non-vanishing entries. If L = 4K + 2, KEN, the first row in M
will be (a part of) the low pass IR h, and if L = 4K the first row will be (a
part of) the high pass IR g, see Exer. 10.5 It can be shown (see [10]) that
this symmetric truncation always produces a full rank matrix (the rows are
linearly independent). As described above this guarantees that we can apply
the Gram-Schmidt orthogonalization procedure to get a new orthogonal ma-
trix. The truncation of the infinite transform matrix with a filter of length L
is thus of the form
roo
} L/2 - I left truncated IRs ,
mL/2-2
mL/2-1
M= } N - L + 2 whnle IRs , (10.19)
mN-L/2+2
mN-L/2+1
} L/2 - I right t"meated IRs .

mN-l
Then all the truncated rows are orthogonalized by the Gram-Schmidt proce-
dure (10.14). It is easy to show that (see Exer. 10.8) we need only orthog-
onalize roo through mL/2-2 with respect to themselves (and not to all the
IRs). So the left boundary filters mi, are defined as
k = 0,1, ... ,L/2 - 2,
and
1 mk
k = 0, ... ,L/2 - 2 .
mk = IImkl1 2 '
In the same way the vectors mN-L/2+2 through mN-1 are converted into
L/2 -1 right boundary filters, which we denote by mo
through m~/2_1. The
Gram-Schmidt orthogonalization of the right boundary filters starts with
mN-I. The new orthogonal matrix then becomes
~
} L/2 - 1 left boundaxy filte" ,
1
m L / 2- 2
mL/2-1
M'= } N - L + 2 whole filte" , (10.20)
mN-L/2+2
mo
} L/2 -1 dght bonndaxy filt"" ,
r
m L / 2- 2
The length of the boundary filter m~/2_2 constructed this way is L - 2, and
the length of mi, is decreasing with k. The right boundary filters exhibit the
same structure.
The boundary filters belonging to the inverse transform are easily found,
since the synthesis matrix is the transpose of analysis matrix. The imple-
mentation of the construction and the use of the boundary filters are both
demonstrated in Chap. 11.
lOA Periodization
The simple solution to the boundary problem was zero padding. Another
possibility is to choose samples from the signal to use for the missing samples.
One way of doing this is to periodize the finite signal. Suppose the original
finite signal is the column vector x, of length N. Then the periodized signal
is given as
x
xP= x = [... x[N - 2] x[N -1] x[O] ... x[N -1] x[O] x[l] ...]T.
X
This signal is periodic with period N, since xP[k+N] = xP[k] for all integers k.
It is important to note that the signal x P has infinite energy. But we can still
IDA Periodization 141
transform it with T a, since we use filters of finite length, such that each row in
T a only has a finite number of nonzero entries. Let yP = T aXP, or explicitly
x[N-2] y[N-2]
h[5] h[4] h[3] h[2] h[l] h[O] 0 0 0 0 x[N-l] y[N-l]
g[5] g[4] g[3] g[2] g[l] g[O] 0 0 0 0 x[O] y[O]
0 o h[5] h[4] h[3] h[2] h[l] h[O] 0 0
0 o g[5] g[4] g[3] g[2] g[l] g[O] 0 0 =
0 0 0 o h[5] h[4] h[3] h[2] h[l] h[O] x[N-l] y[N-l]
0 0 0 o g[5] g[4] g[3] g[2] g[l] g[O] x [0] y[O]
x[l] y[l]
(10.21)
The transformed signal is also periodic with period N. We leave the easy
verification as Exer. 10.9. We select N consecutive entries in yP to represent
it. The choice of these entries is not unique, but below we see that a particular
choice is preferable, to match up with the given signal x.
We have transformed a finite signal x into another finite signal y of equal
length. The same procedure can be used to inversely transform y into x using
the infinite T s (see Exer. 10.9). Thus periodization is a way of transforming a
finite signal while preserving the length of it. In implementations we need to
use samples from x instead of the zero samples used in zero padding. We only
need enough samples to cover the extent of the filters, which is at most L - 2.
But we would like to avoid extending the signal at all, since this requires
extra time and memory in an implementation. Fortunately it is very easy to
alter the transform matrix to accommodate this desire. This means that we
can transform x directly into y.
We start by reducing the infinite transform matrix such that it fits the
signal. If the signal has length N, we reduce the matrix to an N x N matrix,
just at we did in the previous section on boundary filters. Although we do
not need a symmetric structure of the matrix this time, we choose symmetry
anyway in order to obtain a transformed signal of the same form as the
original one.
Let us use the same example as above. For a signal of length 10 and filter
of length 6 the reduced matrix is
h[3] h[2]
h[l] h[O] 0 0 0 0 0 0
g[3] g[2]
g[l] g[O] 0 0 0 0 0 0
h[5] h[4]
h[3] h[2] h[l] h[O] 0 0 0 0
g[5] g[4]
g[3] g[2] g[l] g[O] 0 0 0 0
o 0 h[5] h[4] h[3] h[2] h[l] h[O] 0 0
o 0 g[5] g[4] g[3] g[2] g[l] g[O] 0 0
o 0 0 0 h[5] h[4] h[3] h[2] h[l] h[O]
o 0 0 0 g[5] g[4] g[3] g[2] g[l] g[O]
o 0 0 0 0 0 h[5] h[4] h[3] h[2]
o 0 0 0 0 0 g[5] g[4] g[3] g[2]
The periodization is accomplished by inserting all the deleted filter coeffi-
cients in appropriate places in the matrix. This changes the matrix to
h[3] h[2]
h[l] h[O] 0 0 0 0 h[5] h[4]
g[3] g[2]
g[l] g[O] 0 0 0 0 g[5] g[4]
h[5] h[4]
h[3] h[2] h[l] h[O] 0 0 0 0
g[5] g[4]
g[3] g[2] g[l] g[O] 0 0 0 0
TPa =
o 0 h[5] h[4] h[3] h[2] h[l] h[O] 0 0
(10.22)
o 0 g[5] g[4] g[3] g[2] g[l] g[O] 0 0
o 0 0 0 h[5] h[4] h[3] h[2] h[l] h[O]
o 0 0 0 g[5] g[4] g[3] g[2] g[l] g[O]
h[l] h[O] 0 0 0 0 h[5] h[4] h[3] h[2]
g[l] g[O] 0 0 0 0 g[5] g[4] g[3] g[2]
It can be shown (see Exer. 10.9) that y = T~x is the same signal as found
in (10.21). Now T~ is orthogonal, so the inverse transform is given by (T~) T.
The same principle can be applied to biorthogonal filters. A length 12
signal and a biorthogonal filter set with analysis low pass filter of length 9
and high pass filter of length 3 would give rise to the matrix
h[5] h[4] h[3] h[2] h[l] h[O] 0 0 0 h[8] h[7] h[6]
o g[2] g[l] g[O] 0 0 0 0 0 0 0 0
h[7] h[6] h[5] h[4] h[3]
h[2] h[l] h[O] 0 0 0 h[8]
o 0 0 g[2] g[l] g[O] 0 0 0 0 0 0
o h[8] h[7] h[6] h[5]h[4] h[3] h[2] h[l] h[O] 0 0
TPa =
o 0 0 0 0 g[2] g[l] g[O] 0 0 0 0
(10.23)
o 0 0 h[8] h[7] h[6] h[5] h[4] h[3] h[2] h[l] h[O]
o 0 0 0 0 0 0 g[2] g[l] g[O] 0 0
h[l] h[O] 0 0 0 h[8] h[7] h[6] h[5] h[4] h[3] h[2]
o 0 0 0 0 0 0 0 0 g[2] g[l] g[O]
h[3] h[2] h[l] h[O] 0 0 0 h[8] h[7] h[6] h[5] h[4]
g[l] g[O] 0 0 0 0 0 0 0 0 0 g[2]
The corresponding synthesis matrix cannot be found simply by transpos-
ing the analysis matrix, since it is not orthogonal. It is easily constructed,
however.
10.4 Periodization 143
h[O] g[2] 0 g[O] 0 0 0 g[8] 0 g[6] h[2] g[4]

h[l] g[3] 0 g[l] 0 0 0 0 0 g[7] 0 g[5]
h[2] g[4] h[O] g[2] 0 g[0] 0 0 0 g[8] 0 g[6]
o g[5] h[l] g[3] 0 g[l] 0 0 0 0 0 g[7]
o g[6] h[2] g[4] h[O] g[2] 0 g[O] 0 0 0 g[8]
TPs -- o g[7] 0 g[5] h[l] g[3] 0 g[l] 0 0 0 0
(10.24)
o g[8] 0 g[6] h[2] g[4] h[O] g[2] 0 g[O] 0 0
o 0 0 g[7] 0 g[5] h[l] g[3] 0 g[l] 0 0
o 0 0 g[8] 0 g[6] h[2] g[4] h[O] g[2] 0 g[O]
o 0 0 0 0 g[7] 0 g[5] h[l] g[3] 0 g[1]
o g[O] 0 0 0 g[8] 0 g[6] h[2] g[4] h[O] g[2]
o g[l] 0 0 0 0 0 g[7] 0 g[5] h[l] g[3]
As before, perfect reconstruction is guaranteed, since TrT~ = I. This can be

made plausible by calculating the matrix product of the first row in Tr and
the first column in T~, which gives
h[O]h[5] + h[2]h[3] + g[4]g[1] . (10.25)
To further process this formula, we need the relations between g, 9 and h, h.

They are given by (7.35) and (7.36) on p. 71. First we need to determine k
and c. If we let the z-transform of hand 9 in (10.23) be given by
8 2
H(z) = L h[n]z-n and G(z) =L g[n]z-n ,
n=Q n=Q
and calculate the determinant (7.34), we find that (remember that a different
notation for filters is used in Chap. 7)
H(z)G( -z) - G(z)H(-z) = -2z- 5 ,

and hence k = 2 and c = -1. From (7.35) it then follows that
H(z) = -z 5G(-z) . (10.26)
The odd shift in index due to the power Z5 is a consequence of the perfect
reconstruction requirement, as explained in Chap. 7. The immediate result
of (10.26) is
n n
and if we assume that g[n] =I 0 for n = 0,1,2 (as we implicitly did in (10.23)),
then we find that
h[-5]Z5 + h[-4]z4 + h[-3]z3 = _g[0]Z5 + g[1]z4 - g[2]z3 , (10.27)
which seems to match poorly with our choice of index of h[n] in (10.24).
The reason is that while the index n = -5, -4, -3 of h[n] is the correct
one in the sense that it matches in the z-transform, we usually choose a
more convenient indexing in implementations (like n = 0,1,2). To complete
the calculation started in (10.25), we need to stick to the correct indexing,
however. We therefore substitute g[l] = h[-4]. Doing the same calculation
for g[n], we find the we can substitute g[4] = h[4] (this is left as an exercise).
Now (10.25), with the correct indexing of h, becomes
h[-5]h[5] + h[-3]h[3] + h[-4]h[4]

5
= L h[-k]h[k] = L h[-k]h[k] = 1. (10.28)
k=3 k
The second equality is valid since h[n] = 0 except for n = -5, -4, -3, and
the third follows immediately from (7.56).
10.4.1 Mirroring
Finally, let us briefly describe a variant of periodization. One can take

a finite signal [x [0] ... x[N - 1]], and first mirror it to get the signal
[x[O] ... x[N - 1] x[N - 1] ... , x[O]] of length 2N. Then one can apply
periodization to this signal. The above procedure can then be used to get a
truncated transformation matrix, of size 2N x 2N. It is in general not possible
to get a truncated matrix of size N x N, which is orthogonal.
Let us briefly discuss the difference between periodization and mirroring.
In Fig. 10.3 we have shown a continuous signal on the interval from 0 to T,
which in the top part has been periodized with period T, and in the bottom
part has been mirrored, to give a signal periodic with period 2T.
Sampling these two signals leads to discrete signals that have been peri-
odized or mirrored from the samples located between 0 and T. Two prob-
lems are evident. The periodization can lead to jump discontinuities at the
points of continuation, whereas mirroring leads to discontinuities in the first
derivative, unless this derivative is zero at the continuation points. These
singularities then show up in the wavelet analysis of the discrete signals as
large coefficients at some scale. They are artifacts produced by our boundary
correction method. Mirroring is often used in connection with images, as in
the separable transforms discussed in Chap. 6, since the eye is very sensitive
to asymmetry.
10.5 Moment Preserving Boundary Filters

The two methods for handling the boundary problem presented so far have
focused on maintaining the orthogonality of the transform. Orthogonality is
o T
o T
Fig. 10.3. The top part shows periodic continuation of a signal, and the bottom
part mirroring
important, since it is equivalent with energy preservation. But there are other
properties beyond energy that it can be useful to preserve under transforma-
tion. One of them is related to moments of a sequence.
At the end of Sect. 3.3 we introduced the term moment of a sequence.
We derived
(10.29)
n n
which shows that the transform discussed in Sect. 3.3 (the CDF(2,2) trans-
form) preserves the first moment of a sequence. Generally, we say that
(10.30)
is the k'th moment of the sequence s. Since (10.29) was derived without
taking undefined samples into account, it applies to infinite sequences only.
A finite sequence can be made infinite by zero padding, but this method
causes the sequence Sj-l to be more than half the length of sequence Sj' If
we want (10.29) to be valid for finite sequence of the same length, a better
method is needed. The next two subsections discuss two important questions
regarding the preservation of moments, namely why and how.
10.5.1 Why Moment Preserving Transforms?
To answer this question we will start by making some observations on the

high pass IR g of a wavelet transform. For all such IRs it is true that
L nkg[n] = 0, k = 0, ... ,M - 1, (10.31)

n
for some M 2: 1. The M depends on the filter. For Daubechies 4 this property
holds for M = 2, while for CDF(2,2) we have M = 2, and for CDF(4,6) we
have M = 4. A sequence satisfying (10.31) for some M is said to have M
vanishing moments.
Assume that the filter g has M vanishing moments. Take a polynomial
M-l
j
p(t) =L Pjt
j=O
of degree at most M - 1. We then take a signal obtained by sampling this

polynomial at the integers, Le. we take s[n] = p(n). Now we filter this signal
with g. In the following computation we first change the summation variable,
then insert the polynomial expression for the signal, expand (n - k)j using
the binomial formula, and finally change the order of summation.
(g * s)[n] = L g[n - k]s[n]

k
=L g[k]s[n - k]
k
M-l
= Lg[k] L pj(n - k)j
t (~)(_l)mkmnj-m
k j=O
= Lg[k] tlPj
to (~)
k 3=0 m=O
= ~'P; (_l)mn;-m pmg[kl
=0. (10.32)
Note that we work with a filter g of finite length, so all sum above are finite.
Thus filtering with g maps a signal obtained from sampling a polynomial
of degree at most M - 1 to zero. Note also that we do not have to sam-
ple the polynomial at the integers. It is enough that the sample points are
equidistant.
This property of the high pass filter g has an interesting consequence,
when we do one step in a DWT. Since we have perfect reconstruction, the
polynomial samples get mapped into the low pass part. This is consistent
with the intuitive notion that polynomials of low degree do not oscillate
much, meaning that they contain no high frequencies. The computation in
(10.32) shows that with the particular filters used here the high pass part is
actually zero, and not just close to zero, which is typical for non-ideal filters
(at this point one should recall the filters used in a wavelet decomposition
are not ideal).
Due to these properties it would be interesting to have a boundary correction

method which preserved vanishing moments of finite signals of a given length.
Such a method was found by A. Cohen, 1. Daubechies, and P. Vial [3]. Their
solution is presented briefly in the next section.
10.5.2 How to Make Moment Preserving Transforms
The idea for preserving the number of vanishing moments, and hence be able
to reproduce polynomials completely in the low pass part, is simple, although
the computations are non-trivial. We will only show the basic steps, and leave
out most of the computations.
We start our considerations by redoing the computation in (10.32), this
time for a general filter h of length L, and the same signal s. This time we
choose the opposite order in the application of the binomial formula. In the
second equality we use (7.9). Otherwise the computations are identical. We
get
n
(h * s)[n] = L h[n - k]s[n]
k=n-L+1
L-1
=L h[k]s[n - k]
k=O
L-1 M-1
=L h[k] L pj(n - k)j
't,' t. (~)
k=O j=O
~ ~ h[k] P;
m
n (_k);-m
M-1
= L qm nm , (10.33)
m=O
where we have not written out the complicated expressions for the coeffi-
cients qm, since they are not needed. We see from this computation that
convolution with any filter h of finite length takes a sampled polynomial of
degree at most M - 1 into another sampled polynomial, again of degree at
most M -1. If we have a finite number of nonzero samples, then the resulting
convolution will have more samples, as explained above.
We would like to be able to invert this computation, in the sense that
we start with the signal x of length N, obtained by sampling an arbitrary
polynomial of degree at most M - 1,
M-1
x[n] =L qm nm , n = 0, ... ,N -1,
m=O
and would like to find another polynomial p of degree at most M - 1, and a

signal s of the same length N, obtained by sampling p, such that x = h * s.
To do this for all polynomials of degree at most M - 1 is equivalent to the
boundary filter constructions already done in Sect. 10.3.
We want corrections to the filters used at the start and end of the signal
in order to preserve vanishing moments for signals of a fixed finite length. It
is done as follows. The first (leftmost) boundary filter on the left and the last
(rightmost) boundary filter on the right is chosen such that they preserve
vanishing of the moment of order m = 0 in the high pass part. The next pair
is chosen such that moments or order m = 1 vanish. We continue until we
reach the value of M for the transform under consideration.
It is by no means trivial to construct the boundary filters and to prove that
the described procedure does produce a moment preserving transform, and it
takes further computations to make these new boundary filters both orthog-
onal and of decreasing length (as we did with the Gram-Schmidt boundary
filters, see the description at the end of Sect. 10.3.2).
10.5.3 Use of the Boundary Filters
Unfortunately, the efforts so far are not enough to construct a transform

applicable to finite signals, such that it preserves vanishing moments. Thus
the description above was incomplete. It must remain so, since this is a very
technical question, beyond the scope of this book. Briefly, what remains to
be done is an extra step, which consists in pre-conditioning the signal prior
to transformation by multiplying the first M and the last M samples by an
M x M matrix. After transformation we multiply the result by the inverse
of this matrix, at the beginning and end of the signal.
The software available electronically, see Chap. 14, contains functions im-
plementing this procedure. See the documentation provided there for further
information and explanation.
Exercises
10.1 Start by reviewing results from linear algebra on orthogonal matrices
and the Gram-Schmidt orthogonalization procedure.
10.2 Determine which results in Chap. 7 are needed to show that the syn-
thesis matrix T s in (10.9) is the same as the transpose of the analysis matrix
Ta in (10.7), or equivalently that TJTa = I.
10.3 Show that an orthogonal matrix preserves energy when multiplied with
a signal. In other words, IITxl1 2 = IIxl1 2 whenever T is orthogonal. Remember
that TT = T- 1 .
Exercises 149
lOA Show that the first two rows of T~ in (10.13) are linearly independent.
Hint: Let rl and r2 be the two rows, substitute the g's in r2 with h's, and
show that there does not exist a such that arl = r2.
10.5 Verify the following statement from Sect. 10.3.2: If (L - 2)/2 is even,
then the top row in the symmetrically truncated T a contains coefficients from
the filter h and the bottom row coefficients from the filter g. If (L - 2)/2 is
odd, the positions of the filter coefficients are reversed.
10.6 The purpose of this exercise is to construct the truncated synthesis
matrix T~ to match the truncated analysis matrix T~. Keep in mind the
difference in notation between this chapter and Chap. 7.
1. Write the T~ by reducing the matrix in (10.9) in such a way that its
structure matches that of T~ in (10.12).
2. Let h be an orthogonal low pass analysis filter, and define
5
H(z) =L h[n]z-n.
n=O
Determine the corresponding high pass analysis filter G using (7.67).

3. By using (7.34) and (7.72) determine k and c, and find by (7.35) and
(7.36) the synthesis filters iI and G.
4. To verify that T~ it is indeed the inverse of T~, it is necessary to verify
that T~T~ = I. Calculate two of the many inner products, starting with
the inner product of
[h[4] g[4] h[2] g[2] h[O] g[O]] and [h[l] g[l] h[3] g[3] h[5] g[5]] .
Use the correctly indexed filters, which are the four filters H, G, iI, and
G found above.
5. Determine also the inner product of
[h[5] g[5] h[3] g[3] h[l] g[l]] and [h[l] g[l] h[3] g[3] h[5] g[5]] .
6. Describe why these calculations make it plausible that T~ is the matrix
which reconstructs a signal transformed with T~. Remember that the
condition for perfect reconstruction is T~ T~ = I.
10.7 Show that the matrix T~ in (10.13) is not orthogonal.
10.8 Most of the orthogonalization calculations in (10.15) - (10.18) are re-
dundant. Verify the following statements.
1. (10.15) is unnecessary.
2. (10.16) can be reduced to
I moml
m 1 = ml - Ilmol1 2 mo . (10.34)
3. This result also applies to (10.17) and (10.18).

4. For any transform matrix (for any orthogonal filter of length L) we need
only orthogonalize the L/2 - 2 upper and lower most rows.
5. The low pass left boundary filters constructed this way has staggered
length, Le. no two left boundary filters has the same number of non-
vanishing filter taps.
6. The difference in number of non-vanishing filter taps of two consecutive
low pass left boundary filters is 2.
7. The previous two statements hold for both left and right low and high
pass boundary filters.
10.9 In (10.21) it is claimed that the result of transforming a N periodic
signal x P with T a yields another N periodic signal yp.
1. Show that this is true, and that it is possible to reconstruct x using only
T s and y.
2. Show that y = T~x is the same signal as
[y[O] y[l] ... y[N - 1]] T ,
the period we selected in yP = T aXP •
3. Show that T~ in (10.22) is orthogonal.
Hint: This is equivalent to showing that all the rows of T~ are mutually
orthogonal and have norm 1.
10.10 Explain why it is correct to substitute 9[4] = h[4] (which eventually
gave the 1 in (10.28)), despite the fact that k = 2 and c = -1 in (7.36).
152 11. Implementation
testing programs consisting of several lines of MATLAB code. Adjustments

can be made to the file, and all commands are easily re-executed, by typing
the file name at the prompt again. Once everything works as it should, and
you intend to reuse the code, you should turn it into a function.
MATLAB also offers the possibility to construct a graphical user interface
to scripts and functions. This topic is not discussed here.
Let us look at Function 1.1 again. You should save it as dwt.m (note that
this is just a simple example - the function does not implement a complete
DWT). Concerning the name, then dwt might already be in use. This depends
on your personal set-up of MATLAB. It is simple to check, if the name
is already used. Give the command dwt at the prompt. If you receive an
unknown function error message, then the name is not in use (obviously you
should do this before naming the file).
In the first few examples the initial signal is contained in the vector
Signal. Later we abbreviate the name to S.
Function 1.1 Example of a Function in MATLAB

function R = dwt(Signal)
N = length(Signal); Yo Finds the length of the signal.
s = zeros(l,N/2); Yo Predefines a vector of zeroes,

d = s; Yo and a copy of it.
Yo Here the signal is processed

Yo as in the following examples. See below.
Yo The result is placed in s and d.
R = [s d]; Yo Concatenates s and d.
It is important to remember that indexing in MATLAB starts with 1, and

not O. This is in contrast to the theory, where the starting index usually is O.
We will point out the changes necessary in the following examples.
11.2 Implementing the Haar Transform Through Lifting
We start by implementing the very first example from Chap. 2 as a MATLAB

function. This example is given on p. 7, and for the reader's convenience we
repeat the decomposition here in Table 11.1.
Although the function giving this decomposition is quite short, it is better
to separate it in two functions. The first function calculates one step in the de-
composition, and the second function then builds the wavelet decomposition
using the one step function.
11.2 Implementing the Haar Transform Through Lifting 153
Table 11.1. The first decomposition

56 40 8 24 48 48 40 16
48 16 48 28 8 -8 0 12
32 38 16 10 8 -8 0 12
35 -3 16 10 8 -8 0 12
Let us assume that the signal is in Signal, and we want the means in the
vector s and the differences in the vector d. The Haar transform equations
are (see (3.1) and (3.2))
a+b
8=-2-'
d =a - 8.
Since MATLAB starts indexing at 1 we get

5(1) = 1/2*(Signal(1) + Signal(2));
d(l) = Signal(l) - s(l);
The next pair yields
5(2) = 1/2*(Signal(3) + Signal(4));
d(2) = Signal(3) - 5(2);
and after two further computations s is the vector given by the first four
numbers in the second row in the decomposition, and d is the last four num-
bers in the second row. This is generalized to a signal of length N, where we
assume N is even.
for n=1:N/2
sen) = 1/2*(Signal(2*n-l) + Signal(2*n));
den) = Signal(2*n-l) - sen);
end
The function, shown in Function 2.1, could be named dwthaar (if this name
is not already in use). It takes a vector (of even length) as input argument,
and returns another vector of the same length, with the means in the first
half and the differences in the second half. This is exactly what we need to
construct the decomposition. Save this function in a file named dwthaar. m,
and at the MATLAB prompt type
[s,d] = dwthaar([56 40 8 24 48 48 40 16])
This should give the elements of the second row in the decomposition.
Function 2.1 The Haar Transform
function [s,d] = dwthaar(Signal)
Yo Determine the length of the signal.

N = length(Signal);
YoAllocate space in memory.

s = zeros(l, N/2);
d = s;
Yo The actual transform.
for n=1:N/2
sen) 1/2* (Signal (2*n-l) + Signal(2*n»;
den) = Signal(2*n-l) - sen);
end
Note that it is often a good idea to allocate the memory needed for the
output vectors. This can be done by typing s = zeros (1, Len) , where Len is
the desired length. Then MATLAB allocates all the memory at once instead
of as it is needed. With short signals the time saved is minimal, but with a
million samples a lot of time can be saved this way.
We now have a function, which can turn the first row in Table 11.1 into
the second row, and the first four entries in the second row into the first four
entries in the third row, and so on. The next step is to make a function,
which uses dwthaar an appropriate number of times, to produce the entire
decomposition. As input to this function we have again the signal, while the
output is a matrix containing the decomposition, equivalent to Table 11.1.
First the matrix T is allocated, then the signal is inserted as the first row.
The for loop uses dwthaar to calculate the three remaining rows.
T = zeros(4,S);
T(l,:) = Signal;
for j=1:3
Length = 2-(4-j);
T(j+l, l:Length) = dwthaar( T(j, l:Length) );
T(j+1, Length+l:S) = T(j, Length+l:S);
end
For each level the length of the elements, and hence the signal to be trans-
formed, is determined. Since this length is halved for each increment of j
(first 8, then 4, then 2), it is given as 2-(4-j) = 24 - j • Then the first part
of the row is transformed, and the remaining part is copied to the next row.
This piece of code can easily be extended to handle any signal of length 2N
for N E N. This is done with Function 2.2.
Function 2.2 Wavelet Decomposition Using the Haar Transform

function T = w_decomp(Signal)
N = size(Signal.2);
J = log2(N);
if rem(J .1)
error('Signal must be of length 2-N.');
end
T = zeros(J. N);
T(1.:) = Signal;
for j=1:J
Length = 2-(J+1-j);
T(j+1. 1:Length) = dwthaar( T(j. 1:Length) ):
T(j+1. Length+1:N) = T(j. Length+l:N):
end
The variable J is the number of rows needed in the matrix. Since rem is
the remainder after integer division, rem (J , 1) is the fractional part of the
variable. If it is not equal to zero, the signal is not of length 2N , and an error
message is displayed (and the program halts automatically).
Note that the representation in a table like Table 11.1 is highly inefficient
and redundant due to the repetition of the computed differences. But it is
convenient to start with it, in order to compare with the tables in Chap. 2.
11.3 Implementing the DWT Through Lifting

The implementation of the Haar transform was not difficult. This is the only
transform, however, which is that easy to implement. This becomes clear in
the following where we turn the attention to the Daubechies 4 transform. The
problems we will encounter here apply to all wavelet transforms, when imple-
mented as lifting steps. We will therefore examine in detail how to implement
Daubechies 4, and in a later section we show briefly how to implement the
transform CDF(4,6), which is rather complicated.
There are basically two different ways to implement lifting steps:
1. Each lifting step is applied to all signal samples (only possible when the
entire signal is known).
2. All lifting steps are applied to each signal sample (always possible).
To see what this means, let us review the Daubechies 4 equations:
s(l)[n] = S[2n] + V3S[2n + 1] , (11.1)

d(l)[n] = S[2n + 1] - tV3s(l)[n] - t(V3 - 2)s(1)[n - 1] , (11.2)
s(2)[n] = s(1)[n] - d(l)[n + 1] , (11.3)
s[n] = J3-1()
,j2 s 2 [n] , (11.4)
d[n] = v'~ 1 d(l) [n] . (11.5)
With the first method all the signal samples are 'sent through' equation
(11.1). Then all the odd signal samples and all the S(l) samples are sent
through equation (11.2), and so on. We will see this in more detail later.
With the second method the first equation is applied to first two signal
samples, and the resulting s(1) is put into the next equation along with one
odd signal sample and another s(1). Following this pattern we get one sand
one d from (11.4) and (11.5). This is repeated with the third and fourth signal
samples giving two more sand d values.
The first method is much easier to implement than the second one, espe-
cially in MATLAB. However, in a real application the second method might
be the only option, if for example the transform has to take place while the
signal is 'coming in.' For this reason we refer to the second method as the
real time method, although it does not necessarily take place in real time.
We start with the first method.
11.3.1 Implementing Daubechies 4

First we want to apply equation (11.1) to the entire signal, Le. we want to
calculate s(l)[n] for all values of n. For a signal of length N the calculation
can be implemented in a for loop
for n=1:N/2
sl(n) = S(2*n-l) + sqrt(3)*S(2*n);
end
Remember that MATLAB starts indexing at 1. Such a for loop will work,
and it is easy to implement in other programs such as Maple, S-plus, or in
the C language. But MATLAB offers a more compact and significantly faster
solution. We can interpret (11.1) as a vector equation, where n goes from 1
to N /2. The equation is
:~:;i~l ] = [ ~[~l ]+ V3 [~[~l] .

[s(1)[N/2] S[N'-I] S[N]
Note that the indices correspond to MATLAB indexing. The vector equation
becomes
s1 = S(1:2:N-1) + sqrt(3)*S(2:2:N);
in MATLAB code. This type of vector calculation is exactly what MATLAB
was designed to handle, and this single line executes much faster than the for
loop. The next equation is (11.2), and it contains a problem, since the value
of s(l)[n - 1] is not defined for n = O. There are several different solutions
to this problem (see Chap. 10). We choose periodization, since it gives a
unitary transform and is easily implemented. In the periodized signal all
undefined samples are taken from the other end of the signal, Le. its periodic
continuation. This means that we define s(1)[-I] == S(l) [N/2]. Let us review
equation (11.2) in vector form when we periodize.
d(1)[I]]
d(l) [2] [8[2]]
8[4] v'3 [ s(1)[I]
s(1)[2] ] v'3 - 2 [ s(1)[N/2]
s(1)[I] ]
[ d(1)[~/2] = 8[~] - 4" S(1)[~/2] - -4- S(1)[N~2 _ 1] .
In MATLAB code this becomes

d1 = S(2:2:N) - sqrt(3)/4*s1 - (sqrt(3)-2)/4*[s1(N/2) s1(1:N/2-1)];
Again this vector implementation executes much faster than a for loop.
Note how elegantly the periodization is performed. This would be more cum-
bersome with a for loop. The change of a vector from s1 to [s1 (N/2)
s1(1:N/2-1)] will in the following be referred to as a cyclic permutation
of the vector s 1.
It is now easy to complete the transform, and the entire transform is
Function 3.1 Matlab Optimized Daubechies 4 Transform

s1 = S(1:2:N-1) + sqrt(3)*S(2:2:N);
d1 = S(2:2:N) - sqrt(3)/4*s1 - (sqrt(3)-2)/4*[s1(N/2) s1(1:N/2-1)];
s2 = s1 - [d1(2:N/2) d1(1)];
s = (sqrt(3)-1)/sqrt(2) * s2;
d = (sqrt(3)+1)/sqrt(2) * d1;
This method for implementing lifting steps is actually quite easy, and since
the entire signal is usually known in MATLAB, it is definitely the preferred
one.
If implementing in another environment, for example in C, the vector
operations might not be available, and the for loop suggested in the begin-
ning of this section becomes necessary. The following function shows how to
implement Function 3.1 in C.
Function 3.2 Daubechies 4 Transform in C
for (n = 0; n < N/2; n++) s[n] = S[2*n] + sqrt(3) * S[2*n+1];
d[O] = S[1] - sqrt(3)/4 * 5[0] - (sqrt(3)-2)/4 * s[N/2-1];

for (n = 1; n < N/2; n++)
d[n] = S[2*n+1] - sqrt(3)/4 * s[n] - (sqrt(3)-2)/4 * s[n-1];
for (n = 0; n < N/2-1; n++) s[n] = s[n] - d[n+1];

s[N/2-1] = s[N/2-1] - d[O];
for (n = 0; n < N/2; n++) s[n] (sqrt(3)-1) / sqrt(2) * s[n];
for (n = 0; n < N/2; n++) d[n] (sqrt(3)+1) / sqrt(2) * d[n];
Note that the periodization leads to two extra lines of code, one prior to the
first loop, and one posterior to the second loop. The indexing now starts at O.
Consequently, the C implementation is closer to the Daubechies 4 equations
(11.1) through (11.5) than to the MATLAB code.
There are a few things that can be improved in the above MATLAB and
C implementations. We will demonstrate this in the following example.
11.3.2 Implementing CDF(4,6)
We now want to implement the CDF(4,6) transform, which is given by the

following equations.
s(1)[n] = S[2n] - 41 (S[2n - 1] + S[2n + 1]) , (11.6)
d(1)[n] = S[2n + 1] - (s(1)[n] + s(1)[n + 1]) , (11.7)
s(2)[n] = s(1)[n] - _1_ (-35d(1)[n - 3] + 265d(1)[n - 2] - 998d(1)[n - 1]

4096
- 998d(1) [n] + 265d(1) [n + 1] - 35d(1) [n + 2]) , (11.8)
4
s[n] = J2s(2)[n] , (11.9)
d[n] = J2 d(l)[n] . (11.10)

4
This time the vector equations are omitted, and we give the MATLAB code
directly. But first there are two things to notice.
Firstly, if we examine the MATLAB code from the Daubechies 4 imple-
mentation in Function 3.1, we see that there is really no need to use the
three different variables 51, 52, and 5. The first two variables might as well
be changed to 5, since there is no need to save 51 and 52.
Secondly, we will need a total of 7 different cyclically permuted vectors (with

Daubechies 4 we needed 2). Thus it is preferable to implement cyclic permu-
tation of a vector as a function. An example of such a function is
Function 3.3 Cyclic Permutation of a Vector
%
function P = cpv(S, k)
if k > 0
P = [S(k+1:end) S(1:k)];
else if k < 0
P = [S(end+k+1:end) S(1:end+k)];
end
With this function we could write the second and third lines of the Daube-
chies 4 implementation in Function 3.1 as
d1 = S(2:2:N) - sqrt(3)/4*s1 - (sqrt(3)-2)/4*cpv(s1,-1);
s2 = s1 - cpv(d1,1);
With these two things in mind, we can now write a compact implementation
of CDF(4,6).
Function 3.4 Matlab Optimized CDF(4,6) Transform

s = S(1:2:N-1)
- 1/4*( cpv(S(2:2:N),-1) + S(2:2:N) );
d = S(2:2:N)
- s - cpv(s,1);
s = s - 1/4096*( -3S*cpv(d,-3) +26S*cpv(d,-2) -998*cpv(d,-1)
-998*d +26S*cpv(d,1) -3S*cpv(d,2) );
s = 4/sqrt(2) * s;
d = sqrt(2)/4 * d;
The three dots in the third line just tell MATLAB that the command con-
tinues on the next line. Typing the entire command on one line would work
just as well (but this page is not wide enough for that!).
The C implementation of CDF(4,6) is shown in the next function. Notice
that no less than five entries in s must be calculated outside one of the for
loops (three before and two after).
Function 3.5 CDF(4,6) Transform in C
N = N/2;
s[O] = S[O] - (S[2*N-1] + S[1])/4;

for (n = 1; n < N; n++) s[n] = S[2*n] - (S[2*n-1] + S[2*n+1])/4;
for (n = 0; n < N-1; n++) d[n] = S[2*n+1] - (s[n] + s[n+1]);

d[N-1] = S[2*N-1] - (s[N-1] + s[O]);
5[0] += (35*d[N-3]-265*d[N-2]+998*d[N-1]+998*d[0]-265*d[1]
+35*d[2])/4096:
5[1] += (35*d[N-2]-265*d[N-1]+998*d[0]+998*d[1]-265*d[2]
+35*d[3])/4096;
5[2] += (35*d[N-1]-265*d[0]+998*d[1]+998*d[2]-265*d[3]
+35*d[4])/4096:
for (n = 3; n < N-2: n++)

5[n] += (35*d[n-3]-265*d[n-2]+998*d[n-1]+998*d[n]-265*d[n+1]
+35*d[n+2])/4096;
5[N-2] += (35*d[N-5]-265*d[N-4]+998*d[N-3]+998*d[N-2]-265*d[N-1]
+35*d[0])/4096:
5[N-1] += (35*d[N-4]-265*d[N-3]+998*d[N-2]+998*d[N-1]-265*d[0]
+35*d[1])/4096:
K = 4/5qrt(2):
for (n = 0; n < N; n++) {
5[n] *= K:
d[n] /= K;
}
Due to the limited width of the page some of the lines have been split. Ob-
viously, this is not necessary in an implementation.
11.3.3 The Inverse Daubechies 4 Transform
Daubechies 4Inverting the wavelet transform, Le. implementing the inverse

transform, is just as easy as the direct transform. We show only the inverse of
Daubechies 4. Inverting CDF(4,6) is left as an exercise (and an easy one, too,
see Exer. 11.3). As always when inverting a lifting transform the equations
come in reverse order, and the variable to the left of the equal sign appears
on the right side, and vice versa.
Function 3.6 Inverse Daubechies 4 Transform
d1 = d / «5qrt(3)+1)/5qrt(2»:
52 = 5 / «sqrt(3)-1)/sqrt(2»;
51 = 52 + cpv(d1,1):
S(2:2:N) = d1 + sqrt(3)/4*51 + (sqrt(3)-2)/4*cpv(s1,-1):
S(1:2:N-1) = 51 - sqrt(3)*S(2:2:N);
11.4 The Real Time Method
When implementing according to the real time method we encounter a num-

ber of problems that do not exist for the first method. This is due to the fact
that all transforms have references back and/or forth in time, for example
the second and third equations in Daubechies 4 read
1 1
d(1)[n] = S[2n + 1] - 4V3s(1)[n] - 4(V3 - 2)s(1)[n - 1] , (11.11)
s(2)[n] = s(l)[n] - d(l)[n + 1] , (11.12)
where s(1)[n -1] refers to a previously computed value, and d(l)[n + 1] refers
to a not yet computed value. Both types of references pose a problem, as will
be clear in the next section. The advantage of the real time implementation
is that it can produce two output coefficients (an s and a d value) each time
two input samples are ready. Hence this method is suitable for a real time
transform, which is a transformation of the signal as it becomes available.
11.4.1 Implementing Daubechies 4
We start by writing the equations in MATLAB code, as shown in Func-

tion 4.1. Note that, in contrast to previous functions, the scaling factor has
its own variable K. The assignment of Kis shown in this function, but omitted
in subsequent functions to reduce the number of code lines.
Function 4.1 The Raw Daubechies 4 Equations

K = (sqrt(3)-1)/sqrt(2);
for n=1:N/2
s1(n) = S(2*n-1) + sqrt(3)*S(2*n);
d1(n) = S(2*n) - sqrt(3)/4*s1(n) - (sqrt(3)-2)/4*s1(n-1);
s2(n) = s1(n) - d1(n+1);
s(n) = s2(n) * K;
d(n) = d1(n) / K;
end
Again we see the problem mentioned above. The most obvious one is d1 (n+1)
in the third line of the loop, since this value has not yet been computed. A
less obvious problem is sl (n-1) in the second line of the loop. This is a
previous calculated value, and as such is does not pose a problem. But in the
very first run through the loop (for n = 1) we will need s1(O). Requesting
this in MATLAB causes an error! The problem is easily solved by doing an
initial computation before starting the loop.
Since the value d1(n+1) is needed in the third line, we could calculate
that value in the previous line instead of d1 (n). This means changing the
second line to
d1(n+1) = S(2*(n+1» - sqrt(3)/4*s1(n+1) - (sqrt(3)-2)/4*s1(n);
Now we no longer need s1(n-1). Instead we need sl (n+1), which can be
calculated in the first line by changing it to
sl(n+l) = S(2*(n+l)-1) + sqrt(3)*S(2*(n+l»;

The change in the two lines means that the loop never calculates 8 1(1) and
d1 (1), so we need to do this by hand. Note also that the loop must stop at
n = N /2 - 1 instead of n = N /2, since otherwise the first two lines need
undefined signal samples. This in turn means that the last three lines of the
loop is not calculated for n = N /2, and this computation therefore also has
to be done by hand.
Function 4.2 The Real Time Daubechies 4 'fransform

sl(l) = S(l) + sqrt(3)*S(2);
sl(N/2) = S(N-l) + sqrt(3)*S(N);
dl(l) = S(2) - sqrt(3)/4*sl(1) - (sqrt(3)-2)/4*sl(N/2);
for n=1:N/2-1
sl(n+l) = S(2*(n+l)-1) + sqrt(3)*S(2*(n+l»;
dl(n+l) = S(2*(n+l» - sqrt(3)/4*sl(n+l) - (sqrt(3)-2)/4*sl(n);
s2(n) = sl(n) - dl(n+l);
sen) = s2(n) * K;
den) = dl(n) / K;
end
s2(N/2) = sl(N/2) - dl(l);

s(N/2) = s2(N/2) * K;
d(N/2) = dl(N/2) / K;
Notice how periodization is used when calculating d1 (1) and 82 (N/2). In the
case of d1(1) it causes a problem, since it seems that we need S(N-1) and
S (N), and they are not necessarily available. One solution would be to use
another boundary correction method, but this would require somewhat more
work to implement. Another solution is to shift the signal by two samples, as
demonstrated in Function 4.3.
Function 4.3 The Real Time Daubechies 4 Shifted 'fransform

sl(l) = S(3) + sqrt(3)*S(4);
sl(N/2) = S(1) + sqrt(3)*S(2);
d1(1) = S(4) - sqrt(3)/4*s1(1) - (sqrt(3)-2)/4*s1(N/2);
for n=1:N/2-2
sl(n+1) = S(2*(n+l)-1+2) + sqrt(3)*S(2*(n+l)+2);
d1(n+l) = S(2*(n+1)+2) - sqrt(3)/4*sl(n+1) - (sqrt(3)-2)/4*sl(n);
s2(n) = sl(n) - d1(n+l);
sen) = s2(n) * K;
den) = d1(n) / K;
end
d1(N/2) = S(2) - sqrt(3)/4*s1(N/2) - (sqrt(3)-2)/4*s1(N/2-1);

s2(N/2-1) = s1(N/2-1) - d1(N/2);
5 (N/2-1) s2(N/2-1) * K;
d(N/2-1) d1(N/2-1) / K;
s2(N/2) = sl(N/2) - dl(l);

s(N/2) = s2(N/2) * K;
d(N/2) = dl(N/2) / K;
Notice how one more loop has to be extracted to accommodate for this
change.
Now, once the first four signal samples are available, Function 4.3 will
produce the first two transform coefficients s(l) and d(l), and for each
subsequent two signal samples available, another two transform coefficients
can be calculated.
There is one major problem with the implementation in Function 4.3. It
consumes a lot of memory, and more complex transforms will consume even
more memory. The memory consumption is 5 times N /2, disregarding the
memory needed for the signal itself. But it does not have to be that way. In
reality, only two entries of each of the vectors sl, s2, and dl are used in a
loop. After that they are not used anymore. The simple solution is to change
all sl and s2 to s, and all dl to d.
Function 4.4 The Memory Optimized Real Time Daubechies 4 Transform

5(1) S(3) + sqrt(3)*S(4);
d(l) = S(4) - sqrt(3)/4*s(1) - (sqrt(3)-2)/4* (S(l) + sqrt(3)*S(2»;
for n=1:N/2-2
s(n+l) = S(2*(n+l)-1+2) + sqrt(3)*S(2*(n+l)+2);
d(n+l) = S(2*(n+l)+2) - sqrt(3)/4*s(n+l) - (sqrt(3)-2)/4*s(n);
sen) = sen) - d(n+l);
sen) = sen) * K;
den) = den) / K;
end
s(N/2) = S(l) + sqrt(3)*S(2);

d(N/2) = S(2) - sqrt(3)/4*s(N/2) - (sqrt(3)-2)/4*s(N/2-1);
s(N/2-1) = s(N/2-1) - d(N/2);
s(N/2-1) = s(N/2-1) * K;
d(N/2-1) = d(N/2-1) / K;
s(N/2) = s(N/2) - d(l);

s(N/2) = s(N/2) * K;
d(N/2) = d(N/2) / K;
In this case it does not cause any problems - the function still performs a
Daubechies 4 transform. But it is not always possible just to drop the original
indexing, as we shall see in the next section.
The inverse transform is, as always, easy to implement. It is shown in the
following function.
Function 4.5 The Optimized Real Time Inverse Daubechies 4 Transform

d(N/2) = d(N/2) * K;
s(N/2) = s(N/2) / K;
s(N/2) = s(N/2) + d(l);
d(N/2-1) = d(N/2-1) * K;
s(N/2-1) = s(N/2-1) / K;
s(N/2-1) = s(N/2-1) + d(N/2);
S(2) = d(N/2) + sqrt(3)/4*s(N/2) + (sqrt(3)-2)/4*s(N/2-1);
S(l) = s(N/2) - sqrt(3)*S(2);
for n=N/2-2:-1:1
den) = den) * K;
sen) = sen) / K;
sen) = sen) + d(n+l);
S(2*(n+l)+2) = d(n+l) + sqrt(3)/4*s(n+l) + (sqrt(3)-2)/4*s(n);
S(2*(n+l)-1+2) = s(n+l) - sqrt(3)*S(2*(n+l)+2);
end
S(4) = d(l) + sqrt(3)/4*s(1) + (sqrt(3)-2)/4* (S(l) + sqrt(3)*S(2»;

S(3) = s(l) - sqrt(3)*S(4);
However, this function requires the signals sand d to be available 'backwards,'

since the loop starts at N /2. FUrthermore, it needs the value d (1) in the
third line. It is of course possible to implement an inverse transform which
transform from the beginning of the signal (instead of the end), and which
requires only available samples. We will not show such a transform, but leave
it as an exercise.
11.4.2 Implementing CDF(4,6)
As before, we start with the raw equations for CDF(4,6). Note that the scaling
factor K = 4/ sqrt (2) is omitted.
Function 4.6 The Raw CDF(4,6) Equations

for n = 1:N/2
sl(n) = S(2*n-l) - 1/4*(S(2*n-2) + S(2*n»;
dl(n) = S(2*n) - sl(n) - sl(n+l);
s2(n) = sl(n) - 1/4096*( -35*dl(n-3) +265*dl(n-2) -998*dl(n-l) ...
-998*dl(n) +265*dl(n+l) -35*dl(n+2) );
sen) = s2(n) * K;
den) = dl(n) / K;
end
Obviously we need d1 (n-3) through d1 (n+2). We therefore change the sec-

ond line in the loop to
dl(n+2) = S(2*(n+2» - sl(n+2) - sl(n+3);
which this in turn leads us to change the first line to

s1(n+3) = S(2*(n+3)-1) - 1/4*(S(2*(n+3)-2) + S(2*(n+3)));
With these changes the loop counter must start with n = 4 (to avoid an error
in the third line), and end with N/2 - 3 (to avoid an error in the first line).
The loop now looks like
Function 4.7 The Modified CDF(4,6) Equations

for n = 4:N/2-3
s1(n+3) = S(2*(n+3)-1) - 1/4*(S(2*(n+3)-2) + S(2*(n+3)));
d1(n+2) = S(2*(n+2)) - s1(n+2) - s1(n+3);
s2(n) = s1(n) - 1/4096*( -35*d1(n-3) +265*d1(n-2) -998*d1(n-1)
-998*d1(n) +265*d1(n+1) -35*d1(n+2) );
sen) = s2(n) * K;
den) = d1(n) 1 K;
end
As before we are interested in optimizing memory usage. This time we have

to be careful, though. The underlined dl (n-l) refers to the value dl (n+2)
calculated in the second line three loops ago, and not the value d (n) calcu-
lated in the fifth line in the previous loop. This is obvious since d and dl
are two different variables. But if we just change the dl to d in the second
and third line the dl (n-l) (which then becomes d(n-l)) actually will refer
to d(n) in the fifth line. The result is not a MATLAB error, but simply a
transform different from the one we want to implement.
This faulty reference can be avoided by delaying the scaling in the fifth
(and fourth) line. Since the oldest reference to d is d(n-3), we need to delay
the scaling with three samples. The last two lines of the loop then read
s(n-3) = s2(n-3) * K;
d(n-3) = d1(n-3) 1 K;
Notice that n-3 is precisely the lowest admissible value, since n starts counting
at 4. This is not coincidental, since the value 4 was determined by the index
of the oldest d, with index n-3.
We are now getting closer to a sound transform, but it still lacks the first
three and last three runs through the loop. As before we need to do these by
hand, and as before we need to decide what method we would like to use to
handle the boundary problem. Above we saw how the periodization worked,
and that this method requires samples from the other end of the signal (hence
the name 'periodization'). There exists yet another method, which is not only
quite elegant, but also has no requirement for samples from the other end of
the signal.
The idea is very simple. Whenever we need to apply a step from the lifting
building block (prediction or update step), which requires undefined samples,
we choose a step from another lifting building block that does not require
these undefined samples. If for example we want to apply
1
s[n] = s[n]- 40 96 ( -35d[n - 3] + 265d[n - 2]- 998d[n - 1]
- 998d[n] + 265d[n + 1]- 35d[n + 2])

for n = 3, we will need the undefined sample d[O]. Note that in the CDF
equations here we have left out the enumeration, since this is the form they
will have in the MATLAB implementation.
If we had chosen to periodize, we would use d[N/2]' and if we had chosen to
zero pad, we would define d[O] = O. But now we take another lifting building
block, for example CDF(4,4), and use the second prediction step from this
transform,
s[n] = s[n]- I~8 (5d[n - 2]- 29d[n - 1]- 29d[n] + 5d[n + 1]) .
We will apply this boundary correction method to our current transform, and
we will use CDF(I,I), CDF(4,2), and CDF(4,4), so we start by stating these
(except for CDF(I,I), which is actually the Haar transform).
s[n] = S[2n]- 41 (S[2n - 1] + S[2n + 1]) (11.13)

d[n] = S[2n + 1]- (s[n] + s[n + 1]) (11.14)
1
CDF(4,2) s[n] = s[n]- 16 (-3d[n - 1]- 3d[n]) (11.15)
1
CDF(4,4) s[n] := s[n] - 128 (5d[n - 2] - 29d[n - 1]
- 29d[n] + 5d[n + 1]) (11.16)
1
CDF(4,6) s[n] := s[n]- 4096 (-35d[n - 3] + 265d[n - 2]- 998d[n - 1]
- 998d[n] + 265d[n + 1]- 35d[n + 2]) (11.17)
..j2
d[n] = """"4d[n] (11.18)
4
s[n] = ..j2s[n] (11.19)
The advantage of using transforms from the same family (in this case the
CDF(4,x) family) is that the first two lifting steps and the scaling steps are
the same.
First we examine the loop in Function 4.7 for n = 3. The first two lines
cause not problems, but the third would require d(3-3), which is an undefined
sample. Using CDF(4,4) instead, as suggested above, the first three lines read
8(3+3) = 5(2*(3+3)-1) - 1/4*( 5(2*(3+3)-2) + 5(2*(3+3» );
d(3+2) = 5(2*(3+2» - 8(3+2) - 8(3+3);
8(3) = 8(3) - 1/128*( 5*d(3-2) -29d(3-1) -29*d(3) +5*d(3+1) );
The smallest index of d is now 1 (instead of 0). For n = 2 and n = 1 we

do the same thing, except use CDF(4,2) and CDF(l,l), respectively. We still
need to calculate s (1) through s (3), and d ( 1), d (2). Of these only s (1 )
poses a problem, and once again we substitute another lifting step, namely
the one from CDF(l,l). The transform know looks like
Function 4.8 The Modified CDF(4,6) Transform

5(1) = 8(1) - 8(2); % CDF(1,O
5(2) = 8(2*2-1) - 1/4*(8(2*2-2) + 8(2*2»; %CDF(4,x)
d(1) = 8(2*2-2) - 5(1) - 5(1+1); % CDF(4,x)
5(3) = 8(2*3-1) - 1/4*(8(2*3-2) + 8(2*3»; %CDF(4,x)

d(2) = 8(2*3-2) - 5(2) - 5(2+1); %CDF(4,x)
5(1+3) = 8(2*(1+3)-1) - 1/4*( 8(2*(1+3)-2) + 8(2*(1+3» ); %CDF(4,x)
d(1+2) = 8(2*(1+2» - 5(1+2) - 5(1+3); % CDF(4,x)
5(1) = 5(1) + 1/2*d(1); %CDF(1,1)
5(2+3) = 8(2*(2+3)-1) - 1/4*( 8(2*(2+3)-2) + 8(2*(2+3» ); % CDF(4,x)
d(2+2) = 8(2*(2+2» - 5(2+2) - 5(2+3); % CDF(4,x)
5(2) = 5(2) - 1/16*( -3*d(1) -3*d(2) ); %CDF(4,2)
5(3+3) = 8(2*(3+3)-1) - 1/4*( 8(2*(3+3)-2) + 8(2*(3+3» ); %CDF(4,x)
d(3+2) = 8(2*(3+2» - 5(3+2) - 5(3+3); %CDF(4,x)
5(3) = 5(3) - 1/128*( 5*d(1) -29*d(2) -29*d(3) +5*d(4»; %CDF(4,4)
for n = 4:N/2-3
5(n+3) = 8(2*(n+3)-1) - 1/4*( 8(2*(n+3)-2) + 8(2*(n+3» );
d(n+2) = 8(2*(n+2» - 5(n+2) - 5(n+3);
5(n) = 5(n) - 1/4096*( -35*d(n-3) +265*d(n-2) -998*d(n-1) ...
-998*d(n) +265*d(n+1) -35*d(n+2) ); %CDF(4,6)
5(n-3) = 5(n-3) * K;
d(n-3) = d(n-3) / K;
end
When the same considerations are applied to the end of the signal (the last
three run through of the loop), we get the final function.
Function 4.9 The Real Time CDF(4,6) Transform

5(1) = 8(1) - 8(2); % CDF(1,O
for n = 1:5
5(n+1) = 8(2*(n+1)-1) - 1/4*(8(2*(n+1)-2) + 8(2*(n+1»); %CDF(4,x)
d(n) = 8(2*n) - 5(n) - 5(n+1); %CDF(4,x)
end
5(1) = 5(1) + 1/2*d(1); %CDF(1,1)

5(2) = 5(2) - 1/16*( -3*d(1) -3*d(2) ); % CDF(4,2)
5(3) = 5(3) - 1/128*( 5*d(1) -29*d(2) -29*d(3) +5*d(4»; %CDF(4,4)
for n = 4:N/2-3
s(n+3) = S(2*(n+3)-1) - 1/4*( S(2*(n+3)-2) + S(2*(n+3» );
d(n+2) = S(2*(n+2» - s(n+2) - s(n+3);
s(n) = s(n) - 1/4096*( -35*d(n-3) +265*d(n-2) -998*d(n-l) ...
-998*d(n) +265*d(n+l) -35*d(n+2) );
s(n-3) = s(n-3) * K;
d(n-3) = d(n-3) / K;
end
d(N/2) = S(N) - s(N/2); Yo CDF(1.1)
s(N/2-2) = s(N/2-2) - 1/4096*( -35*d(N/2-5) +265*d(N/2-4)

-998*d(N/2-3) -998*d(N/2-2) +265*d(N/2-1) -35*d(N/2) ); Yo CDF(4.6)
s(N/2-1) = s(N/2-1) - 1/128*( 5*d(N/2-3) -29*d(N/2-2) ...
-29*d(N/2-1) +5*d(N/2) ); Yo CDF(4.4)
s(N/2) = s(N/2) - 1/16*( -3*d(N/2-1) -3*d(N/2) ); Yo CDF(4.2)
for k=5:-1:0
s(N/2-k) = s(N/2-k) * K;
d(N/2-k) = d(N/2-k) / K;
end
Some of the lines have been rearranged in comparison to Function 4.8, in

order to reduce the number of code lines. The values s (1) through s (6) and
d(1) through d(5) might as well be calculated in advance, which is easier,
since then we can use a for loop.
Since the real time method transforms sample by sample, and hence is
expressed in terms of a f or loop (even in MATLAB code), it is easy to convert
Function 4.9 to C. It is simply a matter of changing the syntax and remember
to start indexing at O.
Function 4.10 The Real Time CDF(4,6) Transform in C

s[O] = S[O] - S[1];
for (n = 0; n < 5; n++) {
s[n+l] = S[2*(n+l)] - (S[2*(n+l)-1] + S[2*(n+l)+1])/4;
dEn] = S[2*n+l] - sEn] - s[n+l];
}
s [0] += d [0] /2;
s[l] -= (-3*d[0]-3*d[1])/16;
s[2] -= (5*d[0]-29*d[1]-29*d[2]+5*d[3])/128;
for (n = 3; n < N/2-3; n++) {

s[n+3] = S[2*(n+3)] - (S[2*(n+3)-1] + S[2*(n+3)+1])/4;
d [n+2] = S[2* (n+2) +1] - s [n+2] - s [n+3] ;
sEn] += (35*d[n-3]-265*d[n-2]+998*d[n-l]+998*d[n]-265*d[n+l]
+35*d[n+2])/4096;
s[n-3] = s[n-3] * K;
d[n-3] = d[n-3] / K;
N = N/2;
d[N-1] = S[2*N-1] - s[N-1];
s[N-3] += (35*d[N-6]-265*d[N-5]+998*d[N-4]+998*d[N-3]-265*d[N-2]
+35*d[N-1])/4096;
s [N-2] ( 5*d [N-4] -29*d [N-3] -29*d[N-2] +5*d[N-1]) /128;
s[N-1] (-3*d[N-2] -3*d[N-1])/16;
for (n = 6; n > 0; n--) {

s[N-n] *= K;
d[N-n] /= K;
}
The inverse of the transform is once again easy to implement, in MATLAB

as well as in C. Here it is shown in MATLAB code.
Function 4.11 The Real Time Inverse CDF(4,6) Transform

for k=0:5
d(N/2-k) = d(N/2-k) * K;
s(N/2-k) = s(N/2-k) / K;
end
s(N/2) = s(N/2) + 1/16*( -3*d(N/2-1) -3*d(N/2) );

s(N/2-1) = s(N/2-1) + 1/128*( 5*d(N/2-3) -29*d(N/2-2)
-29*d(N/2-1) +5*d(N/2) );
s(N/2-2) = s(N/2-2) + 1/4096*( -35*d(N/2-5) +265*d(N/2-4)
-998*d(N/2-3) -998*d(N/2-2) +265*d(N/2-1) -35*d(N/2) );
S(N) = d(N/2) + s(N/2);
for n = N/2-3:-1:4
d(n-3) = d(n-3) * K;
s(n-3) = s(n-3) / K;
s(n) = s(n) + 1/4096*( -35*d(n-3) +265*d(n-2) -998*d(n-1)
-998*d(n) +265*d(n+1) -35*d(n+2) );
S(2*(n+2)) = d(n+2) + s(n+2) + s(n+3);
S(2*(n+3)-1) = s(n+3) + 1/4*(S(2*(n+3)-2) + S(2*(n+3)));
end
s(3) = s(3) + 1/128*( 5*d(1) -29*d(2) -29*d(3) +5*d(4));

s(2) = s(2) + 1/16*( -3*d(1) -3*d(2));
s(1) = s(1) - 1/2*d(1);
for n = 5:-1:1
S(2*n) = d(n) + s(n) + s(n+1);
S(2*(n+1)-1) = s(n+1) + 1/4*(S(2*(n+1)-2) + S(2*(n+1)));
end
S(1) = s(1) + S(2);
We finish the implementation of the wavelet transform through lifting by

showing an optimized version of the real time CDF(4,6) implementation.
Here we take advantage of the fast vector operations available in MATLAB.
Function 4.12 The Optimized Real Time CDF(4,6) Transform.
N = length(8)/2;
cdf2 = 1/16 * [-3 -3];
cdf4 = 1/128 * [5 -29 -29 5];
cdf6 = 1/4096 * [-35 265 -998 -998 265 -35];
s(1) = 8(1) - 8(2); % CDF(1,1)

s(2:6) = 8(3:2:11) - (8(2:2:10) + 8(4:2:12»/4; % CDF(4,x)
d(1:5) = 8(2:2:10) - s(1:5) - s(2:6); % CDF(4,x)
s(1) = s(1) + d(1)/2; % CDF(1,1)
s(2) = s(2) - cdf2 * d(1:2)'; % CDF(4,2)
s(3) = s(3) - cdf4 * d(1:4)'; % CDF(4,4)
for n = 4:N-3
s(n+3) = 8(2*n+5) - (8(2*n+4) + 8(2*n+6»/4;
d(n+2) = 8(2*n+4) - s(n+2) - s(n+3);
s(n) = s(n) - cdf6 * d(n-3:n+2)';
s(n-3) = s(n-3) * K;
d(n-3) = d(n-3) / K;
end
d(N) = 8(2*N) - s(N); % CDF(1,1)
s(N-2) = s(N-2) - cdf6 * d(N-5:N)'; % CDF(4,6)

s(N-1) = s(N-1) - cdf4 * d(N-3:N)'; % CDF(4,4)
s(N) = s(N) - cdf2 * d(N-1:N)'; %CDF(4,2)
s(N-5:N) = s(N-5:N) * K;
d(N-5:N) = d(N-5:N) / K;
11.4.3 The Real Time DWT Step-by-Step
In the two previous sections we have shown how to implement the Daube-
chies 4 and the CDF(4,6) transforms. In both cases the function implementing
the transform consists of a core, in the form of a for loop, which performs
the main part of the transformation, and some extra code, which handles
the boundaries of the signal. While there are many choices for a boundary
handling method (two have been explored in the previous sections), the core
of the transform always has the same structure.
Based on the two direct transforms, Daubechies 4, implemented in Func-
tion 4.4, and CDF(4,6), implemented in Function 4.9, we give an algorithm
for a real time lifting implementation of any transform.
1. Write the raw equations in a for loop, which runs from 1 to N/2 (see
Function 4.1).
2. Remove any suffixes by changing 51, 52, and so on to 5, and likewise
with d.
3. Start with the last equation (except for the two scale equations) and find
the largest index, and then change the previous equation accordingly. For
example with CDF(4,6)
d(n) = S(2*n) - s(n) - s(n+1);
s(n) = s(n) - 1/4096*( -35*d(n-3) +265*d(n-2) -998*d(n-1) ...
-998*d(n) +265*d(n+1) -35*d(n+2) );
is changed to
d(n+2) = S(2*(n+2» - s(n+2) - s(n+3);
s(n) = s(n) - 1/4096*( -35*d(n-3) +265*d(n-2) -998*d(n-1) ...
-998*d(n) +265*d(n+1) -35*d(n+2) );
If the largest index is less than n, the previous equation should not be
changed.
4. Do this for all the equations, ending with the first equation.
5. Find the smallest and largest index in all of the equations, and change
the for loop accordingly. For example in CDF(4,6) the smallest index is
d(0-3) and the largest 5(0+3). The loop is then changed to (see Func-
tion 4.7)
for n=4:N/2-3
6. Change the two scaling equations such that they match the smallest
index. In CDF(4,6) this is 0-3, and the scaling equations are changed to
s(n-3) = s2(n-3) * Kj
d(n-3) = d1(n-3) / K;
7. Finally, apply some boundary handling method to the remaining indices.

For CDF(4,6) this would be n = 1,2,3 and n = N /2 - 2, N /2 - 1, N /2.
11.5 Filter Bank Implementation

The traditional implementation of the DWT is as a filter bank. The filters
were presented in Chap. 7, but without stating the implementation formula.
The main difference between the lifting implementation and the filter bank
implementation is the trade-off between generality and speed. The lifting ap-
proach requires a new implementation for each transform (we implemented
Daubechies 4 and CDF(4,6) in the previous section, and they were quite dif-
ferent), whereas the filter bank approach has a fixed formula, independent of
the transform. A disadvantage is that filtering, as a rule of thumb, requires

twice as many calculations as lifting. In some applications the generality is
more important than speed and we therefore also present briefly implemen-
tation of filtering.
Implementation of the wavelet transform as a filter bank is discussed
many places in the wavelet literature. We have briefly discussed this topic
a number of times in the previous chapters, and we will not go into further
detail here. Instead we will show one implementation of a filter bank. Readers
interested in more detailed information are referred to Wickerhauser [30] and
the available C code (see Chap. 14).
11.5.1 An Easy MATLAB Filter Bank DWT
The following MATLAB function is the main part of the Uvi_ Wave function
wt . m, which performs the wavelet transform. It takes the signal, a filter pair,
and the number of scales as input, and returns the transformed signal. The
function uses the MATLAB function cony (abbreviation of convolution which
is filtering) to perform the lowjhigh pass filtering, and then subsequently
decimates the signal (down sample by two). This is a very easy solution (and
actually corresponds to the usual presentation of filter banks, see for example
Fig. 7.4), but it is also highly inefficient, since calculating a lot of samples, just
to use half of them, is definitely not the way to do a good implementation.
Note that there is no error checking in the function shown here (there is in the
complete wt . m from UvL Wave, though), so for instance a too large value of k
(more scales than the length of the signal permits) or passing S as a column
vector will not generate a proper error. The structure of the output vector R
is described in help wt. The variables dIp and dhp are used to control the
alignment of the output signal. This is described in more detail in Sect. 9.4.3
on p. 116.
Function 5.1 Filter implementation of DWT (Uvi_ Wave)

function R = fil_cv(S,h,g,k)
Yo Copyright (C) 1994, 1995, 1996, by Universidad de Vigo
IIp = length(h); Yo Length of the low pass filter

Ihp = length(g); Yo Length of the high pass filter.
L = max([lhp,llp]); Yo Number of samples for the wraparound.
Yo Start the algorithm

R = []; Yo The output signal is reset.
for i = 1:k Yo For every scale (iteration) ...

Ix = length(S);
if rem(lx,2) -= 0 Yo Check that the number of samples

S = [S,O]; Yo will be even (because of decimation).
Ix = Ix + 1;
end
Sp = S; Yo Build wraparound. The input signal

pI = length(Sp); Yo can be smaller than L, so it may
while L > pI Yo be necessary to repeat it several
Sp = [Sp,S]; Yo times.
pI = length(Sp);
end
S = [Sp(pl-L+1:pl),S,Sp(1:L)]; Yo Add the wraparound.
s = conv(S,h); Yo Then do low pass filtering

d = conv(S,g); Yo and high pass filtering.
s = s«1+L):2:(L+lx»; Yo Decimate the outputs

d = d«1+L):2:(L+lx»; Yo and leave out wraparound
R = [d,R]; Yo Put the resulting wavelet step

Yo on its place in the wavelet vector,
S = s; Yo and set the next iteration.
end
R = [S,R]; Yo Wavelet vector (1 row vector)
The word 'wraparound' is equivalent to what we in this book prefer to denote

'periodization.' This principle is described in Sect. lOA.
11.5.2 A Fast C Filter Bank DWT
The filter bank implementation is well suited for the C language. It is more
efficient to use pointers than indices in C, and the structure of the filter bank
transform makes it easy to do just that. The following function demonstrates
how pointers are used to perform the transform. In this case we have chosen
to use boundary filters instead of periodization. The method of boundary
filters is presented in Sect. 10.3. The construction of these boundary filters is
shown later in this chapter, in Sect. 11.6 below.
To interpret this function a certain familiarity with the C language is
required, since we make no attempt to explain how it works. The reason for
showing this function is that an efficient C implementation typically trans-
forms 1.000 to 10.000 times faster than the various Uvi_ Wave transform im-
plementations.
In this function N is the length of the signal, HLen the length of the
ordinary filters (only orthogonal filters can be used in this function). HA,
GA, and LBM are pointers to the ordinary and boundary filters, respectively.
Finally, EMN is the number of boundary filters at each end. In contrast to the
previously presented transform implementation in this chapter, this function

includes Gray code permutation (see Sect. 9.3). This piece of code is from
dwte. c, which is available electronically, see Chap. 14.
Function 5.2 Filter implementation with boundary correction - in C
double *SigPtr, *SigPtr1, *SigPtr2, *Hptr, *Gptr, *BM;

int GCP, EndSig, m, ni
if (fmod(HLen,4» GCP = Oi else GCP 1i
SigPtr1 = &RetSig[GCP*N/2]i
SigPtr2 = &RetSig[(1-GCP)*N/2];
1* LEFT EDGE CORRECTION (REALLY A MUL OF MATRIX AND VECTOR). *1

BM = LBM;
for (n = 0; n < EMN-1i n += 2)
{
SigPtr = Signal;
*SigPtr1 = *BM++ * *SigPtr++;
for (m = 1; m < EMM-1; m++) *SigPtr1 += *BM++ * *SigPtr++;
*SigPtr1++ += *BM++ * *SigPtr++;
SigPtr = Signali
*SigPtr2 = *BM++ * *SigPtr++i
*SigPtr2++ += *BM++ * *SigPtr++i
}
if (lfmod(HLen,4»
{
SigPtr = Signal;
SigPtr = SigPtr1;
SigPtr1 SigPtr2;
SigPtr2 = SigPtr;
}
1* THE ORDINARY WAVELET TRANSFORM (ON THE MIDDLE OF THE SIGNAL). *1
for (n = 0; n < N/2-EMNi n++)
{
SigPtr = &Signal[2*n]i
Hptr HAi
Gptr = GA;
*SigPtr1 *Hptr++ * *SigPtri

*SigPtr2 *Gptr++ * *SigPtr++;
for (m = 1; m < HLen-1; m++)
{
*SigPtr1 += *Hptr++ * *SigPtr;
*SigPtr2 += *Gptr++ * *SigPtr++;

}
*SigPtr1++ += *Hptr++ * *SigPtr;
*SigPtr2++ += *Gptr++ * *SigPtr++;
}
1* RIGHT EDGE CORRECTION (REALLY A HUL OF MATRIX AND VECTOR). *1

EndSig = N-EMM;
BM = REM;
for (n = 0; n < EMN-1; n += 2)
{
SigPtr = &Signal[EndSig];
}
if (!fmod(HLen,4))
{
for (m = 1; m < EMM; m++) *SigPtr1 += *BM++ * *SigPtr++;
}
The disadvantage of using pointers instead of indices is that the code becomes
difficult to read. This can be counteracted by inserting comments in the
code, but it would make the function twice as long. We have chosen to omit
comments in this function, and simply let it illustrate what a complete and
optimized implementation of a filter bank DWT looks like in C. The original
function dwte has many comments inserted.
11.6 Construction of Boundary Filters
There are may types of boundary filters, and we have in Chap. 10 presented
two types, namely the ones we called Gram-Schmidt boundary filters, and
those that preserve vanishing moments. Both apply, as presented, only to
orthogonal filters. The first method is quite easy to implement, while the
second method is more complicated, and it requires a substantial amount of
computation.
Because of the complicated procedure needed in the vanishing moments
case, we will omit this part (note that a MATLAB file is electronically avail-
able), and limit the implementation to Gram-Schmidt boundary filters.
11.6.1 Gram-Schmidt Boundary Filters
The theory behind this method is presented in Sect. 10.3, so in this section
we focus on the implementation issues only. We will implement the method
according to the way the M and M' matrices in (10.19) and (10.20) were
constructed. The only difference is that we will omit the ordinary filters in
the middle of the matrices, since they serve no purpose in this context.
We start the construction with a matrix containing all possible even trun-
cations of the given filters.
h[l] h[O] 000 0
g[l] g[O] 000 0
h[L - 5] h[L - 6]··· h[l] h[O] 0 0 (11.20)

g[L - 5] g[L - 6] ... g[l] g[O] 0 0
h[L - 3] h[L - 4]··· h[3] h[2] h[l] h[O]
g[L - 3] g[L - 4] ... g[3] g[2] g[l] g[O]
The number of left boundary filters is L/2 - 1, so we reduce the matrix to
the bottom L/2 - 1 rows (which is exactly half of the rows). Note that the
first row in the reduce matrix is (part of) the low pass filter, if L = 4K + 2,
but (part of) the high pass filter, if L = 4K. Consequently, the last row is
always a high pass filter. The next step is to Gram-Schmidt orthogonalize
the rows, starting with the first row. Finally, the rows are normalized. The
same procedure is used to construct the right boundary filters.
Function 6.1 Construction of Left and Right Boundary Filters

function [LBM, RBM] = boundary(H);
H=H(:)'j %Ensures a rov vector

L = length(H)j
G = fliplr(H).*«-1).A[O:L-1])j %Construct high pass
%Construct matrices from H and G.
for k = 2:2:L-2
LBM(k-1,1:k) = H(L-k+1:L); %Construct left boundary matrix
LBM(k ,1 :k) = G(L-k+1:L);
RBM(k-1,L-k-1:L-2) = G(1:k)j %Construct right boundary matrix
RBM(k ,L-k-1:L-2) = H(1:k); %vhich is upside dovn
end
LBM = LBM(L/2:L-2,:); %Truncate to last half of rovs

RBM = RBM(L/2:L-2,:)j
%Do Gram-Schmidt on rovs of LBM.

for k = 1:L/2-1
v = LBM(k,:) - (LBM(1:k-1,:) * LBM(k,:)')' * LBM(1:k-1,:)j
LBM(k,:) = v/norm(v)j
end
% Do Gram-Schmidt on rows of RBM.

for k = 1:L/2-1
v = RBM(k,:) - (RBM(l:k-l,:) * RBM(k,:)')' * RBM(l:k-l,:);
RBM(k,:) = v/norm(v);
end
RBM = flipud(RBM); %Flip right matrix upside down
The first for loop constructs the left matrix, as shown in (11.20), and the
right matrix, followed by a truncation to the last half of the rows. Note that
RBM is upside down. Then the Gram-Schmidt procedure is applied. Here we
take advantages of MATLAB's ability to handle matrices and vectors to do
the sum required in the procedure (see (10.14)). Of course, the sum can
also be implemented as a for loop, which would be the only feasible way
in most programming environments. The normalization is done after each
orthogonalization.
This function only calculates the analysis boundary filters, but since they
are constructed to give an orthogonal transform, the synthesis boundary fil-
ters are found simply by transposing the analysis boundary filters.
It is relatively simple to use the Gram-Schmidt boundary filters. The
matrices constructed with Function 6.1 is multiplied with the ends of the
signal, and the interior filters are applied as usual. To determine exactly
where the boundary and interior filters are applied, we can make use of the
transform in matrix form, as discussed in Chap. 10.
It is a bit more tricky to do the inverse transformation of the signal.
First we note that the matrix form in for example (10.12) results in a mixing
of the low and high pass transform coefficients, as described in Sect. 10.2.
We therefore have two options. Either we separate the low and high pass
parts prior to applying the inverse transform (the inverse transform then has
the structure known from the first several chapters), or we apply a slightly
altered transform, which fits the mixing of low and high pass coefficients. In
the former case we use the ordinary synthesis filters, and since the synthesis
boundary filters are given as the transpose of the analysis boundary matrices
LBM and RBM, they have to be separated, too. In the latter case the ordinary
synthesis filters do not apply immediately (the boundary filters do, though).
We will focus on the latter case, leaving the former as an exercise.
To see what the inverse transform looks like for a transformed signal with
mixed low and high pass coefficients, we first turn to (10.9). We see that the
synthesis filters are columns of the transform matrix, but when that matrix
is applied to a signal, the inner products, the filtering, happens row-wise.
Examining the two full rows, however, we see that the low and high pass
filters coefficients are mixed, which corresponds nicely to the fact that the
signal is also a mix of the low and high pass parts. Therefore, if we use
the two full rows in (10.9) as filters, we get a transform which incorporates
both up sampling and addition in the filters. At least the addition is usually
a separate action in a filter bank version of the inverse transform (see for
instance Fig. 7.2 and Fig. 7.4).
The matrix of inverse transform is given as the transpose of the direct
transform matrix. The synthesis matrix is shown in (10.9) before trunca-
tion. The analysis filters occur vertically in the matrix, but we show them
horizontally below. For instance, if the analysis filters are given by
h = [h[I] h[2] h[3] h[4] h[5] h[6]] ,

g = [g[I] g[2] g[3] g[4] g[5] g[6]J ,
then the new, horizontal filters are
hr = [h[5] g[5] h[3] g[3] h[I] g[l]J ,

gr = [h[6] g[6] h[4] g[4] h[2] g[2]] .
Note also how these new filters are used both as whole filters and as truncated
filters.
This alternative implementation of the synthesis matrix is also illustrated
in Fig. 11.1. The analysis matrix is given as a number of interior (whole) filters
and the left and right boundary filters. Here the boundary filters are shown
as two matrices. The inverse transform matrix is the transpose of the analysis
transform matrix, and when we consider the transposed matrix as consisting
of filters row-wise, the structure of the synthesis matrix is as shown in the left
matrix in Fig. 11.1. The boundary filters are still given as two submatrices,
which are the transpose of the original boundary filter matrices. This figure
is also useful in understanding the implementation of the direct and inverse
transform in the filter bank version. The Function 6.2 demonstrates the use
of the boundary filters to transform and inversely transform a signal.
Function 6.2 Application of Gram-Schmidt Boundary Filters

function S2 = ApplyBoundary(S)
L = 10; 'l. The length of the filter
[h.g] = daub(L); 'l. Get Daubechies L filter
[LBM.RBM] = boundary(h); 'l. Construct GS boundary filters
'l. Construction of alternative synthesis filters

hr(1:2:L-1) = h(L-1:-2:1);
hr(2:2:L) = g(L-1:-2:1);
gr(1:2:L-1) = h(L:-2:2):
gr(2:2:L) = g(L:-2:2);
N = length(S):
'l. Initialize transform signals
T = zeros (i.N) ;
£-2
[ [
LBM
-h-
] L/2-1
]
." ]
." ]
'"
L,-
,
hr- ]
[ -g- ] l,,- gr - ]
[ -h- ] [ -hr - ]
[ - gr - ]
~
[ -hr - ]
[ - g - ]
[ - gr - ]
[ - h - ] h ' "
[ - r -"J
[ - g - ]
[ - gr -"J
L/2-1 [::~ ]
\.
[
[
,,'
'"
Fig. 11.1. The direct (left) transform matrix is constructed by the left and right
boundary matrices and a number of ordinary, whole filters. The inverse (right)
transform matrix is the transpose of the direct transform matrix, and the structure
shown here is a result of interpreting the matrix as consisting of filters row-wise
T2 = Tj
S2 = T2j
Yo Direct transform with GS boundary filters
T(1:L/2-1) = LBM * S(1:L-2)'j Yo Apply left matrix
T(N-L/2+2:N) = REM * S(N-L+3:N)'j Yo Apply right matrix
for k = 1:2:N-L+1
T(k+L/2-1) = h * S(k:L+k-1)'j Yo Apply interior filters
T(k+L/2) = g * S(k:L+k-1)'j
end
T = [T(1:2:N-1) T(2:2:N)]j Yo Separate low and high pass
Yo Inverse transform with GS boundary filters

T2(1:2:N-1) = T(1:N/2)j Yo Mix low and high pass
T2(2:2:N) = T(N/2+1:N)j
for k = 1:2:L-3
S2(k) = hr(L-k:L) * T2(L/2:L/2+k)'j Yo Apply truncated
S2(k+1) = gr(L-k:L) * T2(L/2:L/2+k)'j Yo interior filters
end
S2(1:L-2) = S2(1:L-2) + (LBM' * T2(1:L/2-1)')'; Yo Apply left matrix
for k = 1+(L/2-1):2:N-L+l-(L/2-1)
S2(k+L/2-1) = hr * T2(k:L+k-l)'; Yo Apply whole
S2(k+L/2) = gr * T2(k:L+k-l)'; Yo interior filters
end
for k = N-L+3:2:N-l
S2(k) = hr(l:N-k+l) * T2(k-L/2+1:N-L/2+1)'; Yo Apply truncated
S2(k+l) = gr(l:N-k+l) * T2(k-L/2+1:N-L/2+1)'; Yo interior filters
end
S2(N-L+3:N) = S2(N-L+3:N) + (RBM' * T2(N-L/2+2:N)')'; Yo Right matrix
11.7 Wavelet Packet Decomposition
In most applications one DWT is not enough, and often it is necessary to

do a complete wavelet packet decomposition. This means applying the DWT
several times to various signals. The wavelet packets method was presented in
Chap. 8. Here we show how to implement this method. We need to construct
a function, which takes as input a signal, two filters, and the number of levels
in the decomposition, and returns the decomposition in a matrix. It is just a
matter of first applying the DWT to one signal, then to two signal, then to
four signal, an so on.
Function 7.1 Wavelet Packet Decomposition
function 0 = wpd(S,h,g,J)
N = length(S);
if J > floor(log2(N»
error('Too many levels.');
else if rem(N,2 A (J-l»
error(sprintf('Signal length must be a multiple of 2 A Yoi.',J-l»;
end
o = zeros (J ,N) ;
0(1,:) = S;
Yo For each level in the decomposition

Yo (starting with the second level).
for j = l:J-l
width = N/2 A (j-l); Yo Width of elements on j'th level.
Yo For each pair of elements on the j'th level.

for k = 1:2 A (j-l)
Interval = [l+(k-l)*width:k*width];
O(j+l,Interval) = dwt(O(j,Interval),h,g);
end
end
There are two loops in this function. One for the levels in the decomposition,
and one for the elements on each level. Alternatively, the dwt function could
be made to handle more than one element at the time. The electronically
available function dwte takes a number of signals as input, and transforms
each of them. Thus the two for loops in the previous function can be reduced
to
for j=1:J-1
D(j+1,:) = dwte(D(j,:),h,g,2 A (j-1»;
end
The fourth argument gives the number of signals within the signal D(j,:).
11.8 Wavelet Packet Bases, Basis Representation,

and Best Basis
Once the full wavelet packet decomposition has been computed to a pre-
scribed level (compatible with the length of the signal), a possible next step
is to find the best basis. Implementing the best basis algorithm is not diffi-
cult, but there is one point which needs to be settled before we can proceed.
We have to decide how to represent a basis in the computer. We will focus on
MATLAB, but the principle applies to all types of software. This issue is ad-
dressed in the first section. The following two sections discuss implementation
of cost computation and of best basis search.
11.8.1 Basis Representation
There are two different ways to represent a basis in a wavelet packet de-
composition in MATLAB. In UvL Wave the basis is described by a binary
tree, and the basis is represented by the depth of the terminal nodes, start-
ing from the lowest frequency node, located on the left. See the left hand
part of Fig. 11.2. The other representation is given by going through the
entire binary tree, marking selected nodes with 1 and unselected nodes with
0, starting from the top, counting from left to right. This principle is also
shown in Fig. 11.2 In both cases the representation is the vector containing
the numbers described above.
The choice of representation is mostly a matter of taste, since they both
have advantages and disadvantages. In this book we have chosen the second
representation, and the following MATLAB functions are therefore based on
this representation. A conversion function between the two representations is
available electronically, see Chap. 14.
original original
signal level signal
1
2
3
o
4
[1332] [0 . 1 0 . 0 0 0 1 . 0 0 0 0 1 1 0 0]
Fig. 11.2. Two different basis representation. Either we write the level of each
element, starting from the left, or we go through the elements from top to bottom,
marking the selected elements with 1 and unselected elements with 0
11.8.2 Calculating Cost Values for the Decomposition
In a decomposition with J levels there will be a total of
20 + 21 + ... + 2 J - 1 = 2J - 1
elements, and we enumerate them as shown in Fig. 11.3. The best basis search
1
2 3
4 I 5 6 7 I
819110111 12113114115
Fig. 11.3. The enumeration of the elements in a full wavelet decomposition with
four levels
uses two vectors, each of length 2J - 1. One contains the cost values for each
elements, and the other contains the best basis, once we have completed the
basis search.
In the following we assume that D is a matrix containing the wavelet
packet decomposition of a signal S of length 2N , down to level J. This de-
composition could for example be the output from Function 7.1. An element
in this decomposition is a vector consisting of some of the entries of a row in
the matrix D. We will need to extract all the elements in the decomposition,
and we therefore need to know the position and length of each element. First,
the level of the element is found in MATLAB by j=floor(log2(k» + 1. As
an example, the elements enumerated as 4, 5, 6, and 7 are on the third level,
and the integer parts of 10g2 of these are 2, so the formula yields j=3. Note
that the levels start at 1 and not 0 (as prescribed by the theory). This is
solely due to MATLAB's inability to use index O. Thus the reader should
keep this change in numbering of the levels in mind, when referring back to
the theory.
Now we find the length of an element on the j'th level as (with j = 1
being the first level)
length of signal = 2N = 2N - j +l
number of elements on the j'th level 2j-l
The vector, which is the first element on the j'th level is then found by
D(j,l:L), where L=2~(N-j+l). The second element is D(j,1+L:2*L), and
the third is D(j ,1 +2*L: 3*L), an so on. Generally, the m'th element at level
j is D(j, 1+(m-l)*L:m*L).
Function 8.1 Generating a Vector with Cost Values
j '" size(D, 1); %Levels in the decomp

SignalLength'" size(D, 2); %Length of signal
N '" log2(SignaILength);
CostValues '" zeros(1, 2-j - 1); % Initialize cost value vector

Yo Apply the cost function to each element in the decomposition
for k"'1:2-j-1
j '" floor(log2(k» + 1; Yo Find the level
L '" 2-(N-j+1); %Find length of element
%Go through all elements on the j'th level
for m"'1:2-(j-1)
E '" D(j, 1 + (m-1)*L: m*L); %Extract element
CostValues(k) '" CostFunc(E); %Calculate cost value
end
end
When D is a decomposition matrix, the Function 8.1 will create a vector

CostValues, which contains the cost value for each element in the decompo-
sition. The reference CostFunc is to the given cost function, which also has
to be implemented (see Sect. 8.3 and Sect. 11.9).
11.8.3 Best Basis Search
The next step is to find the best basis. We let a vector Basis of the same
length as CostValues represent the basis. The indexing of the two vectors is
the same, that is the elements are enumerated as in Fig. 11.3.
In Fig. 11.4 we have marked a basis with shaded boxes. This basis is then
represented by the vector
Basis = [0 0 1 1 0 0 0 0 0 1 1 0 0 0 0] .
Fig. 11.4. Basis enumeration and basis representation
In this vector we have put a 1 in the places corresponding to marked elements,

and 0 elsewhere.
The implementation of the best basis algorithm on p. 94 is not difficult.
First all the bottom elements are chosen, and a bottom-up search is per-
formed. The only tricky thing is step 4(a), which requires that all marks
below the marked element are deleted. Doing this would require some sort of
search for 1's, and this could easily become a cumbersome procedure. Instead
we temporarily leave the marks, and once the best basis search is completed,
we know that the best basis is given as all the top-most 1'so To remove all
the unwanted l's we go through the binary tree again, this time top-down.
Each time we encounter a 1 or a 2 in an element, we put a 2 in both elements
just below it. Hence, the tree is filled up with 2 below the chosen basis, and
thus removing all the unwanted l's. Finally, all the 2's are converted to O's.
Function 8.2 Best Basis Search

%Mark all the bottom elements.
Basis = [zeros(1. 2-(J-1)-1) ones(1. 2-(J-1))];
%Bottom-up search for the best basis.

for j=J-1:-1:1
for k=2-(j-1):2-j-1
v1 = CostValues(k);
v2 = CostValues(2*k) + CostValues(2*k+1);
if v1 >= v2
Basis(k) = 1;
else
CostValues(k) = v2;
end
end
end
% Fill with 2's below the chosen basis.

for k=1:(length(Basis)-1)/2
if Basis(k) == 1 I Basis(k) == 2
Basis(2*k) = 2;
Basis(2*k+1) = 2;
end
end
Exercises 185
Yo Convert all the 2's to O's.

Basis = Basis .* (Basis == 1);
11.8.4 Other Basis Functions
Most of the properties related to the concept of a basis can be implemented

in MATLAB. We have seen how to implement calculation of cost values and
search for the best basis. Other useful properties which can be implemented
are
• a best level search,
• displaying a basis graphically,
• checking the validity of a basis representation,
• displaying the corresponding time-frequency plane,
• reconstruction of a signal from a given basis,
• alteration of signal given by a certain basis.
The last property is important, since it is the basis for many applications,
including denoising and compression. We do not present implementations of
these properties, but some are discussed in the exercises.
11.9 Cost Functions
Most cost functions are easily implemented in MATLAB, since they are just
functions mapping a vector to a number. For example, the £P norm is calcu-
lated as sum (abs (a) . -p)-O/p), where a is a vector (an element from the
decomposition), and p is a number between 0 and 00.
The Shannon entropy can be a problem, however, since it involves the
logarithm to the entries in a. If some of these are 0, the logarithm is undefined.
°
We therefore have to disregard the entries. This is done by a(find(a»,
because find(a) in itself returns the indices of non-zero entries. Hence, the
Shannon entropy is calculated as
-sum( a(find(a».A2 .* log(a(find(a».A2)
Note that log in MATLAB is the natural logarithm.
Exercises
11.1 Modify dwthaar such that the transform becomes energy preserving
(see (3.28)-(3.31)), Le. such that norm(Signal) = norm (dwthaar (Signal) )
11.2 Construct from dwthaar (Function 2.1) another function idwthaar,

which implements the inverse Haar transform:
1. Construct the function such that it takes the bottom row of the decom-
position as input and gives the original signal (upper most row in the
decomposition) as output.
2. Construct the function such that it takes any row and the vertical location
of this row in the decomposition as inputs, and gives the original signal
as output.
3. Modify the function in 2. such that it becomes energy preserving.
11.3 Implement the inverse of Function 3.4, and verify that it actually com-
putes the inverse by applying it to the output of Function 3.4 on known
vectors.
11.4 One possible implementation of the inverse of the Daubechies 4 trans-
form is shown in Function 4.5. The implementation inversely transforms the
signal 'backwards' by starting at index N /2 instead of index 1. Implement
the inverse Daubechies 4 transform, using periodization, such that it starts at
index 1, and test that the implementation is correct by using Function 4.4.
Do not shift the signal, just use signal samples from the other end of the
signal whenever necessary.
Hint: Start all over by using the approach described at the beginning of
Sect. 11.4.1 on p. 161, and the Functions 4.1, 4.2, and 4.3.
11.5 The CDF(3,3) transform is defined by the following equations.
1
s(l)[n] = S[2n] - 3S[2n - 1] ,
1
d(l)[n] = S[2n + 1] - -(9s(1)[n] + 3s(1)[n + 1]) ,
8
s(2)[n] = s(l)[n] + 361 (3d(1)[n - 1] + 16d(1)[n] - 3d(1)[n + 1]) ,
V2 [n] ,
s[n] = _s(2)
3
3
d[n] = V2d(l) [n] .
1. Write out the transform in matrix form.
2. Show that the corresponding filters are given by
h = -V2
64
[3 -9 -74545 -7 -9 3]
'
V2
g = """8 [-13 -3 1] .
Exercises 187
11.6 Write a function that performs a full wavelet decomposition to the

maximal level permitted by the length of the signal. You can start from
FUnction 2.2 on p. 155.
1. Start with a function which generates the decomposition in Table 3.1,
i.e. a signal of length 8 transformed using the Haar transform.
2. Extend your function to accept signals of length 2N .
3. Change the function to use CDF(3,3), see Exer. 11.5. Here you have to
solve the boundary problem. The easy choice is zero padding.
11.7 An in-place implementation of a wavelet transform is more complicated
to realize. Start by solving the following problems.
1. Explain how the entries in a transformed vector are placed, as you go
through the full decomposition three. See Table 3.2 for an example.
2. Write a function which computes the location in the decomposition ma-
trix used previously, based on the indexing used in the decomposition.
3. Write a function which implements an in place Haar transform, over a
prescribed number of levels.
11.8 Implement the inverse of the wavelet transform that uses the Gram-
Schmidt boundary filters in such a way that it applies to a transformed
signal with separated low and high pass parts. Remember that the low and
high pass boundary filters must be separated to do this.
11.9 Write a function which implements the best level basis search in a full
wavelet decomposition to a prescribed level.
11.10 Not all vectors containing 0 and 1 entries can be valid representations
of a basis.
1. Describe how the validity of a given vector can be checked (you have to
check both length and location of 0 and 1 entries).
2. Write a function, which performs this validity check.
12. Lifting and Filters II
There are basically three forms for representing the building block in a
DWT: The transform can be represented by a pair of filters (usually low
pass and high pass filters) satisfying the perfect reconstruction conditions
from Chap. 7, or it can be given as lifting steps, which are either given in the
time domain as a set of equations, or in the frequency domain as a factored
matrix of Laurent polynomials. The Daubechies 4 transform has been pre-
sented in all three forms in previous chapters, but so far we have only made
casual attempts to convert between the various representations. When trying
to do so, it turns out that only one conversion requires real work, namely
conversion from filter to matrix and equation forms. In Chap. 7 we presented
the theorem, which shows that it is always possible to do this conversion,
but we did not show how to do it. This chapter is therefore dedicated to dis-
cussing the three basic forms of representation of the wavelet transform, as
well as the conversions between them. In particular, we give a detailed proof
of the 'from filter to matrix/equation' theorem stated in Chap. 7. The proof
is a detailed and exemplified version of the proof found in 1. Daubechies and
W. Sweldens [7].
12.1 The Three Basic Representations
We begin by reviewing the three forms of representation, using the Daube-

chies 4 transform as an example.
Matrix form:
Equation form:
s(1)[n] = S[2n] + V3S[2n + 1] , (12.2)

1 1
d(l)[n] = S[2n + 1] - 4V3s(1)[n] - 4(V3 - 2)s(1)[n - 1] , (12.3)
s(2)[n] = s(1)[n] - d(1)[n + 1] , (12.4)

190 12. Lifting and Filters II
s[n] = J3-1()
V2 S 2 [n] , (12.5)
d[n] = y--:; 1 d(l) [n] . (12.6)
Filter form:
1
h = M [1 + J3, 3 + J3, 3 - J3, 1 - J3] , (12.7)
4y2
1
g = M [1- J3, -3 + J3, 3 + J3, -1- J3] (12.8)
4y2
Depending on the circumstances each form has its advantages. In an imple-
mentation it is always either the equation form (when implementing as lifting
steps) or the filter form (when implementing as a filter bank) which is used.
However, when concerned with the theoretical aspects of the lifting theory,
the matrix form is very useful. Moreover, if we want to design a filter via
lifting steps (some basic steps for this were presented in Sect. 3.2 and 3.3),
but use it in a filter bank, we need a tool for converting the lifting steps, in
either matrix or equation form, to the filter form. On the other hand, if we
want to use existing filters as lifting steps, we need to convert the filter form
to the equation form. In brief, it is very useful to be able to convert between
the three forms. Here is a list of where we present the various conversions.
Matrix B equation Sect. 12.2
Matrix -t filter Sect. 7.3
Equation -t filter Sect. 12.3
Filter -t matrix Sect. 12.4
Filter -t equation Sect. 12.4
The only real challenge is converting the filter form to the matrix form, or
the equation form. But first we do the easy conversions.
12.2 From Matrix to Equation Form
The factored matrix form (12.1) is closely related to the equation form. Each
2 x 2 matrix corresponding to one equation, except for the first matrix, which
corresponds to the two scale equations. Note that due to the way matrix
multiplication is defined, the steps appear in the reverse order in the matrix
form. The last matrix is the first one to be applied to the two components of
the signal, and the first matrix in the product is the normalization step.
When using the matrix form for transforming a signal, the signal is given
by its z-transform S(z), which is split into its even and odd components and
placed in a vector. So
12.2 From Matrix to Equation Form 191
H(z) [SO(Z)]
Sl(Z)
= [SL(Z)]
SH(Z) ,
where SL(z) and SH(Z) denote the z-transform of the low and high pass
transformed signals respectively. We now apply H(z) one matrix at a time.
where S(1)(z) is the intermediate step notation for So(z) + J3S1(Z). Multi-
plication with the second matrix gives
while multiplication with the third matrix gives
Finally the scaling matrix is applied. Collecting all the three equations we
have
SCl)(Z) = So(z) + v'3s1 (z) ,
J3 J3-2
D(1)(z) = Sl(Z) - 4S(1)(z) - -4-z-1S(1)(z) ,
S(2)(Z) = SCl)(Z) - ZDCl)(z) ,
S(z) = J3 -1 S(2)(z) ,
v'2
D(z) = V:;;-l D(1)(z).
To get the equation form we use the definition of the z-transform and the
uniqueness of the z-transform representation. The original signal is in the
time-domain given by the sequence {S[n]}. Thus
n n
see also (7.16)-(7.25). Using the notation S(1)(z) = l:n s(1)[n)z-n, as in

Chap. 7, we have that the first equation reads
n n
and then uniqueness of the z-transform representation yields (12.2). The

remaining equations follow by similar computations, if we also use the shifting
properties associated with multiplication by z, see (7.10), and by Z-l, see
(7.11).
The other way, from equation to matrix form, is done by the reverse
procedure (see Exer. 12.4).
12.3 From Equation to Filter Form

In the previous chapters we have repeatedly seen how references back and
forth in time in the equation form can be handle by inserting appropriate
expressions instead of the reference. Actually, this can be done for all refer-
ences. We do this systematically by starting with the last equation and work
our way up.
This is here exemplified with Daubechies 4. We begin with the last equa-
tion (12.6)
d[n] = v'~ 1 d(l)[n] .

The reference to d(l)[n] can be replaced by the actual expression for d(1)[n].
-
d[n] = v'3+1 v'3
..;2 (s[2n + 1] - 4 s(1)[n] -
v'3-2
-4-s(1)[n - 1]) .
Then we insert s(1)[n] and s(1)[n -1].
- 1 v'3 (s[2n] + v~3s[2n + 1])

d[n] ..;2 ( s[2n + 1]- 4
= v'3+
v'3-2
- -4-(s[2n ~
- 2] + v3s[2n -1]) )
v'3-1 3-v'3 v'3+3 v'3+1
= 4..;2 s[2n-2] + 4..;2 s[2n-l]- 4..;2 s[2n] + 4..;2 s[2n+l]
2
= l: g[m]s[2n - m] ,
m=-l
where g is the high pass impulse response (12.8) of Daubechies 4 (except for
a change of sign). In the same manner we find
1+v'3 3+v'3 3-v'3 I-v'3

s[n] = 4..;2 s[2n-2] + 4..;2 s[2n-l] + 4..;2 s[2n] + 4..;2 s[2n+l]
2
= l: h[m]s[2n - m] ,
m=-l
where h is the low pass impulse response (12.7) of Daubechies 4. The rewrit-
ten expressions for s[n] and d[n] show that they correspond to a convolution
of the impulse responses and four samples. As n takes the values from 0 to
half the length of the original signal, a filtering has occurred.
There are two distinct differences between the filter form and the equation
form. Filtering the signal requires approximately twice as many calculations
compared to 'lifting it' (see Exer. 12.1), but the filter form is much more
easily implemented, since the convolution is independent of the structure of
the signal and of the transform. It is unfortunately not possible to have at the
same time both efficient and general implementations of the equation form.
12.4 From Filters to Lifting Steps

We have now seen how to make lifting steps into ordinary FIR filters. This
presented no particular challenge, since it was merely a matter of expanding
equations. The other direction, i.e. making filters into lifting steps, is a bit
more tricky, since we now have to factorize the polyphase matrix. In Chap. 7
we showed with Theorem 7.3.1 that it could be done, but we omitted the con-
structive proof. This section is therefore dedicated to a thorough discussion of
the proof, whereby the algorithm for factoring the polyphase matrix is given.
The proof given here is mainly due to 1. Daubechies and W. Sweldens [7].
First, we restate the theorem.
Theorem 12.4.1. Given a 2 x 2 matrix
H(z) - [Hoo(Z) H 01 (Z)]

- H lO (z) H u (z) ,
where the Hnk(Z) are Laurent polynomials, and where
detH(z) = Hoo(z)Hu(z) - H01 (z)HlO (z) = 1, (12.9)
then there exist a constant K '" 0 and Laurent polynomials
Sl(Z), ... ,SM(Z), and T1(z), ... ,TM(Z),
such that
H(z)
K
= [0
0] lIM [10 Sk(Z)]
K- 1 k=l 1
[ 1 0]
Tk(Z) 1 . (12.10)
This theorem requires the matrix to have determinant 1, so in order to apply

it to an analysis wavelet filter pair, we need Hnk of the filter to fulfill (12.9).
The question is therefore whether this is always the case for wavelet filters.
We know that H(z) performs the DWT of a signal in the z-transform,
and we know that this transform is invertible. Therefore H-l(z) exists, and
according to Proposition 7.2.1 det H(z) = a-I z-n for some a -j 0 and n.
Consequently,
So the determinant 1 requirement can always be fulfilled with a wavelet filter

by choosing the proper z-transform.
We are now ready to begin the proof of Theorem 12.4.1. It is presented in
the following sections along with some examples of factorization. Note that
we now use the same notation as in Chap. 7, that is Ho(z) for the low pass
analysis filter and HI (z) for the high pass analysis filter, and Go(z) and G I (z)
for the synthesis filters.
12.4.1 The Euclidean Algorithm
The Euclidean algorithm is usually first presented as an algorithm for finding

the greatest common divisor of two integers. But it can be applied to many
other analogous problems. One application is to finding the greatest common
divisor of two polynomials. Here we apply it to Laurent polynomials. We
recall that the z-transform of a FIR filter h is a Laurent polynomial, which
is a polynomial of the form
k.
h(z) =L h[k]z-k .
k=kb
Here kb and k e are integers satisfying kb ~ k e . This is in contrast to ordinary

polynomials, where we only have nonnegative powers. To define the degree
of a Laurent polynomial, we assume that h[kb] -j 0 and h[ke ] -j O. Then the
degree of h(z) is defined as
The zero Laurent polynomial is assigned the degree -00. A polynomial of

degree 0, such as 3z 7 , is also referred to as a monomial.
Take two Laurent polynomials a(z) and b(z) -j 0 with la(z)1 ~ Ib(z)l. Then
there always exist a Laurent polynomial q(z), the quotient, with Iq(z)1 =
la(z)I-lb(z)l, and a Laurent polynomial r(z), the remainder, with Ir(z)1 <
Ib(z)1 such that
a(z) = b(z)q(z) + r(z) . (12.11)
We use the notation
q(z) = a(z)/b(z), and r(z) = a(z)%b(z) .

If b(z) is a monomial, then r(z) = 0, and the division is exact. A Laurent

polynomial is invertible, if and only if it is a monomial, see the proof of
Proposition 7.2.1. In other words, the only Laurent polynomials with product
equal to 1 are pairs az m and a-I z-m.
The division (12.11) can be repeated with b(z) and r(z), as in
(12.12)
Since the degree of the remainder decreases by at least one, it takes at most
Ib(z)1 + 1 steps to achieve a remainder equaling O. This argument proves the
following theorem.
Theorem 12.4.2 (Euclidean Algorithm for Laurent Polynomials).
Given two Laurent polynomials a(z) and b(z) =f. 0, such that la(z)1 ~ Ib(z)l.
Let ao(z) = a(z) and bo(z) = b(z), and iterate the following steps starting
from n = 0
an+1 (z) = bn(z) , (12.13)
bnH(z) = an(z)%bn(z) . (12.14)
Let N denote the smallest integer with N ~ Ib(z)1 + 1, for which bN(Z) = o.
Then aN(z) is a greatest common divisor for a(z) and b(z).
We note that there is no unique greatest common divisor for a(z) and b(z),
since if d(z) divides both a(z) and b(z), and is of maximal degree, then
azmd(z) is also a divisor of the same degree.
This theorem is the key to constructing the 2 x 2 matrix lifting steps,
since each iteration will produce one matrix, as we show below.
It is important to note that q(z) and r(z) in (12.11) are not unique. Usu-
ally there is more than one valid choice. This is easily seen with an example.
Let
a(z) = _Z-I +6 - z .
b(z) = 2z- + 2 .1
Then q(z) is necessarily on the form c + dz, so from (12.11)
_Z-I + 6 - z = 2cz- 1 + 2c + 2d + 2dz + r(z) . (12.15)
By proper choice of c and d we can match at least two of the three terms
(if we could match all three terms we would have an exact division and thus
r(z) = 0). Let us match the two first in (12.15), that is terms z-I and zoo
Then
1 1
c =- 2 and 2(- 2 + d) = 6 {::>
Since r(z) = a(z) - b(z)q(z) we find that

r(z) = _Z-l + 6 - z- (_Z-l + 6 + 7z) = -8z .

If we instead first match the two z terms in (12.15) we get d = -1/2, and if
we then match the two z-l, we get c = -1/2. Thus
r(z) = _Z-l + 6 - z- (_Z-l - 2 - z) = 8.
Both factorizations of a(z) are valid, and both will serve the purpose of
Theorem 12.4.2.
12.4.2 The Euclidean Algorithm in Matrix Form
The first step in proving Theorem 12.4.1 is to examine the iterations defined
in Theorem 12.4.2 and rewrite them in the form of a product of 2 x 2 matrices,
whose entries are Laurent polynomials. If we let qn+l (z) = an(z)/bn(z), then
the first step in the algorithm is
al(z) = bo(z) ,
b1(z) = ao(z) - bO(Z)ql(Z) ,
the next steps is
a2(z) = b1 (z) ,
b2(z) = al(z) - bl (z)q2(Z) ,
and after N steps
aN(z) = bN-I(Z) ,
0= bN(Z) = aN-I(z) - bN-I(Z)qN(Z) .
Note that according to the theorem bN(Z) = 0. The first step can also be
written
al(z)]
[ bl(z) -
[01 -ql(Z)
1 ] [a(z)]
b(z)'
while the second step becomes
Finally we get
[ °
aN(z)] = III [0 -qn(z)
1
1 ] [a(z)]
b(z)
.
n=N
Note the order of the terms in this product, as given by the limits in the
product. The term with index N is at the left end, and the one with index 1
is at the right end. We now note that the inverse of
qn(Z)
[ 10'
1]
as the reader immediately can verify. Thus we can multiply by these inverse
matrices on the in the equation. Consequently
[~~;~] = IT
N
z
[qni ) ~] [aN~Z)] . (12.16)
Let us now apply this to the low pass filter Ho(z). The polyphase components
of Ho(z) are Hoo(z) and HOl(z), and if we let a(z) = Hoo(z) and b(z) =
H Ol (z) we get
HOO(Z)] = rrN
[H 01 (z)
[qn(z)
1
1] [K0ZC] .
0
(12.17)
n=l
Notice that we get a monomial as a greatest common divisor of Hoo(z) and

HOl(z). This can be seen as follows. We know that
Hoo(z)Hu(z) - H01 (z)H lO (z) = 1. (12.18)
Let p(z) denote a greatest common divisor of Hoo(z) and H01 (z). This means
that both Hoo(z) and H01 (z) can be divided by p(z), and hence the entire
left hand side in (12.18) can be divided by p(z). But then the right hand side
must also be divisible by p(z), and the only Laurent polynomial that divides
1 is a monomial. Hence p(z) = Kzc. Since the theorem stated that aN(z) was
one of the greatest common divisors, and since common divisors differ only
by a monomial factor, we then deduce that aN(z) is a monomial.
If we moreover multiply (from the right) on both sides of (12.17) with z-c
we get
(12.19)
Multiplying Hoo(z) with z-c only shifts the indices in the z-transform, and
hence does not change the fact that it is the even coefficients of the low pass
impulse response. In other words by choosing the right z-transformation of
the low pass impulse response, it is always possible to end up with aN(z) = K.
12.4.3 Example on Factoring a Laurent Polynomial

Before we continue with the proof of the factorization theorem, let us clar-
ify the above results with an example. We use CDF(2,2) to show how the
factorization works. The low pass impulse response is given by
v'2
8" [ -1262 -1] .
We begin with the symmetric transform (omitting the scaling v'2/8)

Ho(z) = _z-2 + 2z- 1 + 6 + 2z - Z2 . (12.20)
The polyphase components are then
Hoo(z) = _z-1 + 6 - z, and H01 (z) = 2z- 1 + 2,

according to (7.38) on p. 72. The first step in the algorithm, that is theo-
rem 12.4.2, is
ao(z) = _Z-1 + 6 - z,
bo(z) = 2z- 1 + 2 .
If we match terms from the left (see description at the end of Sect. 12.4.1),
we get
-z -1 + 6 - z 2z + 2)· (1
= (-1 7) -
-2 + 2z 8z , (12.21)
such that
1 7
ql (z)= - 2 + 2z ,
rl(z) = -8z.
The next steps is then
al (z) = bo(z) = 2z- 1 + 2 ,

b1(z) = rl(z) = -8z.
Again matching from the left (although is does not matter in this particular
case)
such that
q2 (Z )
1_1
= -41 z-2 - 4z ,
r2(z) = 0 .
Finally
a2(z) = b1(z) = -8z ,

b2(z) = r2(z) = 0 .
Since b2 (z) = 0, we have found a greatest common divisor of ao(z) and bo(z),
namely a2(z) = -8z. Putting all this into (12.17) yields
HOO(Z)] = [-Z-l +1 6 - z]
[HOl(z) 2z- +2
_[-t + ~z 1]
- 1 0
[-tz-2 -
1
tz-1 01] (12.22)
Unfortunately we did not get a constant in the last vector. This can be
achieved through multiplication by Z-l on both sides
[
_Z-2+ 6z-
1
2z- 2 + 2z- 1
-1] = [-t +1 ~z 01] [-t z - 1 2
- t z - 1 1]
0
[-08] . (12.23)
So if we had chosen the z-transform
Ho(z) = _z-4 + 2z- 3 + 6z- 2 + 2z- 1 - 1,
instead of (12.20) the gcd would have been a constant. Note that choosing a
factorization that does not give a constant gcd is by no means fatal. In fact,
no matter what z-transform we start with, we get the same matrices (provide
that the same matching of terms is used), and the only difference is the gcd.
So if the gcd is not constant, simply keep the coefficient and discard whatever
power of z is present. This is exactly what we just did; the only difference
between the lifting steps (the right hand sides) in (12.22) and (12.23) is that
the z in the former is discarded in the latter.
But this factorization is not our only option. If we had chosen another
matching in the first step, say
_Z-l +6- z = (2z- 1 + 2) . (-~ - ~z) + 8,

instead of (12.21) we would end up with
[-z;;_~~~ 1Z] = [-t ~ tz~] [tz-~ + t~] [~],

which does not need any modification. Incidentally, this is the form of the
equation (3.36) and (3.37) on p. 23, and if we multiply 8 with ../2/8, the
omitted scaling of the impulse response, we get ../2, the scaling of the low
pass part in (3.41).
The important point to note here is that since division with remainder
of Laurent polynomials is not unique, neither is the factorization into lifting
steps. The fact that the gcd in some cases is not a constant is a trivial problem,
as described above. But a more serious, and definitely not trivial, problem
arises. While the first factorization had a factor 7/2, the second factorization
had no factors larger than 1/2 (disregarding the final scaling factor). This
means that although the output of the complete transform is never larger than
2 times the input, intermediate calculations has the potential of becoming
at least 3.5 times larger than the input. This may not seem to be a serious
problem. However, for longer filters the values in the intermediate calculations
can becomes significantly larger. In Sect. 12.5 we give another example which
demonstrates this phenomenon. Stated briefly, it is important to choose the
right factorization.
12.4.4 Completing the Factorization
We still have some more work to do before the final factorization is achieved,
and from now on we assume that the factorization is done such that aN(z)
is a constant. The form of (12.19) is not entirely the same as the form of
(12.10). It can be made a little more alike if we observe that
q(Z)
[ 1 0
1] = [10 q(Z)]
1
[0 1] = [0 1] [ 1 0]
1 0 1 0 q(z) 1 .
Using the first equation for odd n, and the second one for even n, gives
g [~q2nl1(z)]
N/2
[~~~~~~] = [q2:(Z) ~] [~] . (12.24)
If N is odd, we take q2n(Z) = O. If we now replace
we get
HOO(Z) H~o(Z)]
[H01 (z) H ll (z)
= IT [1
n=l
0
Q2n-l(Z)] [
1
1 0] [K0 K-01] ,
Q2n(Z) 1
(12.25)
where these equations define Hfo(z) and Hh (z). By transposing both sides
(remember that the transpose of a matrix product is the transpose of each
matrix, multiplied in the reverse order) we get the following result, which is
closer to the goal.
HOO(Z) H01(Z)] _
[Hfo(z) Hfl(z) -
[K0 K-0] IT [10 Q2n(Z)]
1 n=M 1
[ 1 0]
Q2n-l(Z) 1 .
(12.26)
All we need to do now is to find out how Hfo(z) and Hfl (z) are connected
with H lO (z) and H ll (z).
To do this we observe that if the analysis filter pair (Ho(z), Hi (z)) has a
polyphase representation with determinant 1, then any other analysis filter
pair (Ho(z),Hfew(z)) is related by
Hrew(z) = Hi(z) + HO(z)t(z2) ,

where t(z) is a Laurent polynomial. To verify this we need to show that the
determinant of the polyphase matrix of (Ho(z), Hpew(z)) is 1.
H new ( ) _ [Hoo(z) H01 (Z)]

z - HP~W(z) Hp{W(z)
Hoo(z) H01 ( Z ) ]
- [H lO (z) + Hoot(z) H ll (z) + HOi (z)t(z)
_[1t(z) 0]1 [HOO(Z)
-
HOi (Z)]
H lO (z) H ll (z) .
It follows that detHneW(z) = detH(z) = 1.
Applying this result to (12.26), we can get the original high pass filter
Hi (z) in the following way. From the previous calculation we know that
there exists a Laurent polynomial t(z) such that
Hoo(Z) Hfo(z)] [-t(Z)] _ [HlO (Z)]
[H01 (z) Hfi(Z) 1 - H ll (z) ,
and by multiplying on both side with the inverse of the 2 x 2 matrix, we find
that
-t(z)] _ [Hfi(Z) -Hfo(z)] [H lO (Z)]
[ 1 - -HOi(z) Hoo(z) H ll (z) ,
and thus
t(z) = H~o(z)Hll(Z) - H~l(Z)HlO(Z) . (12.27)
Thus, multiplying (12.26) from the left with
[-~Z) ~]
gives
where we have used the simple relation
[-t~z) ~] [~ KO-i] = [~ K~i] [-K~t(Z) ~] .

By a suitable reindexing of the q polynomials (and at the same time making
K 2t(z) one of them), it is now possible to determine the S(z) and T(z) in
Theorem 12.4.1.
This concludes the constructive proof the lifting theorem. In the next
sections we will give examples and show that there can be numerical problems
in this constructive procedure.
12.5 Factoring Daubechies 4 into Lifting Steps

We now give two examples of creating lifting steps using the algorithm pre-
sented in the previous sections. The first example is Daubechies 4 which
should be well-known by now, since we have discussed it in Sect. 3.4 and
Sect. 7.3. Since we have the exact filter taps in (7.76) on p. 83, we can also
find the exact lifting steps. The other example is Coiflet 12, see 1. Daube-
chies [6], in which case the exact filter taps are not available, and we therefore
have to do the calculations numerically. This second examples demonstrates
not only how to handle a longer filter (which is not much different from
Daubechies 4), but also the importance of choosing the right factorization.
The Daubechies 4 filter taps are given by
H 0-
- [.!.±.fl ~ 3-0 1-°]
4.,12 4.,12 4.,12 4.,12 .
The even and odd coefficients are separated into
3+v'3 1-v'3
Hoo(z) = ao(z) = ~4y2
+ ~z
4y2
,
1+v'3 3-v'3
HOl(z) = bo(z) = ~4y2
+ ~z
4y2
.
Remember that the choice of z-transform does not matter for the final fac-
torization (see end of Sect. 12.4.3), and we choose the z-transform with no
negative powers. The first step is to find q1 (z). Since ao(z) and bo(z) have
the same degree, the quotient is a monomial. Matching from the left yields
1 (z) = leftmost term of ao (z) = ~

4.,12 = 3 + v'3
q leftmost term of bo(z ) .!.±.fl
4.,12
1 + v'3
= (3 + v'3)(1 - v'3) = -2v'3 = v'3

(1 + v'3)(1 - v'3) -2 .
The remainder is then
r1 (z) = ao(z) - bO(Z)Q1 (z)

= (3 + v'3 + 1 - v'3 z) _ (1 + v'3 + 3 - v'3 z) .v'3
4V2 4V2 4V2 4V2
1 - v'3 - 3v'3 + 3
= 4V2
z
1- v'3
= V2 z.
This was the first iteration. The next one begins with
12.5 Factoring Daubechies 4 into Lifting Steps 203
This time the quotient has degree 1, since b1(z) is one degree less than a1(z).
More specific, q2(Z) most be on the form cz- 1 + d. Matching from the left
means determining c first, and matching from the right means determining
d first. We will do the latter. Thus d, the constant term in Q2(Z), is
3-0 M
d- 4.,/2 3-v3
Z _
- 1-0 z - 4(1 - J3)
.,/2
Since Ib 1(z)1 = 0 we know that r2(z) = 0 (the remainder always has degree
less than the divisor), so we are looking for Q2(Z) such that a1 (z) = b1(Z)Q2(Z).
Consequently,
( 1 + J3 + 3 - J3 z) = 1 - J3 z . (cz- 1 _ J3)
4V2 4V2 V2 4 '
which is valid for only one value of c, namely
!±fl
4.,/2
M
1+v3 2+v3
M
c=--= =---
l=.Y:1 4(1 - J3) 4
.,/2
Therefore
2 + J3 -1
Q2(Z) = - - 4 - Z - 4J3 '
r2(z) =0 ,
1-J3
a2(z) = b1(z) = V2 z.
In order to have the correct high pass filtering, we need to apply (12.27).
First we use (12.26) to find Hfo and Hh. Note that we use 1~z as the
multiplier in this case.
(12.29)
(12.30)
Since in this example
Hoo(z) = h[l] + h[3]z, and HOI (Z) = h[O] + h[2]z , (12.31)
we find by (7.16) that
Ho(z)= H OO (Z2) + Z-I HOI (z2) = h[O]Z-1 + h[l] + h[2]z + h[3]z2 .

With k = 0 and c = 1 if follows from (12.30) that
HI(z) = -h[O] + h[l]z-1 - h[2]z-2 + h[3]z-3 ,
and thus
HlO(z) = -h[O]- h[2]z-l, and Hll(z) = h[l] + h[3]z-1 . (12.32)
We now insert these H lO and H ll together Hio and Hil from (12.29) into
(12.27), which yields (we skip the intermediate calculations)
t(z) = H~o(z)Hll (z) - H~I (z)HlO(z) = (2 + V3)Z-1 .

Finally, we determine the extra matrix necessary, as shown in (12.28),
-(2 + V3)z
_1(1-v'J
J2 z)2 = z ,
(notice again that we use the multiplier I-Xz) and then entire H(z) can
Wm~
now be reconstructed as
H(z) ~ ['1 z _'tfz-' -¥zl-1- 4][1 ~]

There is still an undesired z in the first matrix, but it can safely be removed.
Although the consequence is that the right hand side no longer equals H(z),
it is still a valid factorization into lifting steps. It just results in a different
z-transformation of the even and odd low and high pass analysis filters.
12.6 Factorizing Coiflet 12 into Lifting Steps

The next filter has a somewhat longer impulse response, and shows the im-
portance of choosing the right factorization. To avoid filling the following
pages with sheer numbers, we always round to four digits or less. This is only
in the writing of the numbers, however. The calculation have been performed
with several more digits, and more accurate lifting coefficients are given in
12.6 Factorizing Coifiet 12 into Lifting Steps 205
Table 12.1. Note that the inverse transform is the right one, up to the number
of digits used in the numerical computation, since this is how lifting steps
work.
We begin by giving the Coiflet 12 filter taps. They can be found in several
software package (but not in UvL Wave), and in the paper [6, p. 516].
ho = [ 0.0164 -0.0415 -0.0674 0.3861 0.8127 0.4170

-0.0765 -0.0594 0.0237 0.0056 -0.0018 -0.0007] .
12.6.1 Constructing the Lifting Steps
We choose a z-transform representation for the odd and even filter taps
ao(z) = 0.0164z- 2
- 0.0674z- 1 + 0.8127 - 0.0765z 1 + 0.0237z 2 - 0.0018z 3 ,
bo(z) = -0.0415z- 2
+ 0.3861z- 1 + 0.4170 - 0.0594z 1 + 0.0056z 2 - 0.0007z 3 ,
and carry out the first step in the algorithm. We choose to match the two
Z-2 terms.
0.0164
ql(Z) = -0.0415 = -0.3952,
Tl (z) = 0.0852z- + 0.9775 -
1
0.1000z 1 + 0.0259z 2 - 0.0021z 3 .
The next step in the algorithm starts with
a1(z) = -0.0415z- 2
+ 0.3861z- 1 + 0.4170 - 0.0594z 1 + 0.0056z 2 - 0.0007z 3 ,
b1(z) = 0.0852z- 1 + 0.9775 - 0.1000z 1 + 0.0259z 2 - 0.0021z 3 ,
and the quotient q2(Z) is obviously of the form cz- 1 + d. We have three
options: Either we match both from the left, both from the right, or c from
the left and d from the right. The three cases yield
-0.0415
-0.0415 -1 0.3861 - 0.0852.0.9775
Q2(Z) = 0.0852 z
+ 0.0852
= -0.4866z- + 10.11 ,
1
(12.33)
-0.0007
-0.0007 0.3861 - -0.0021 . 0.0259 -1
Q2(Z) = -0.0021 + 0.0852 z
= 1.5375z- 1 + 0.3418 , (12.34)
Q2
( ) = -0.0415 z -1 -0.0007 =
z 0.0852 + -0.0021
-04866 -1
. z +
0.3418
, (12.35)
respectively. Here we see the problem mentioned at the end of Sect. 12.4.3,
namely that some factorizations lead to numerically unstable solutions. In an
effort to keep the dynamic range of the coefficients at a minimum, we choose
to continue with the numerically smallest q2(Z), i.e. the one in (12.35). In
fact, all of the following q's are chosen this way. The next five factorizations
are given by
q2(Z) = -0.4866z- l + 0.3418,

r2(z) = 0.8326z- l
+ 0.0342 - 0.0127z - 0.0043z 2 ,
Q3(Z) = 0.1024 + 0.4941z ,
r3(z) = 0.5627 - 0.1156z + 0.0325z 2 ,
Q4(Z) = 1.480z- l + 0.3648,
r4(z) = -0.0187z - 0.016z 2 ,
Q5(Z) = 9.492z- l - 2.017 ,
r5(z) = 0.7403,
Q6(Z) = -0.0253z - 0.0218z 2 ,
r6(z)= O.
Since the next step is setting b6(z) = r6(z) = 0, we have now reached the first
index with bn equal to zero. Hence, according to Theorem 12.4.2, we have
found a greatest common divisor of H oo and H 01 , namely a6(z) = b5(z) =
r5(z) = 0.7403. This is also the scaling factor, the K in Theorem 12.4.1, as
was shown in (12.16) and (12.17).
Inserting now into (12.24)
Hoo(z)]
[HOl(z)
= II
N/2
[10 Q2n-l(z)]
1
[ 1 0] [K]
Q2n(Z) 1 0
n=l
= [1 -.3952] [ 1 0] [1 .1024 + .4941Z]

o 1 -.4866z- l + .3418 1 0 1
l
1 0] [1 9.492z- - 2.017]
[1.480z- + .3648 1 0
l
1
1 0] [.7403]
[ -.0253z - .0218z2 1 0
reproduces the even and odd part of the low pass filter. By substituting
.7403] with [.7403 0 ] [.7403 0 ]

[ 0 o (.7403)-1 = 0 1.351'
we also get the two filters Hfo and Hfl in (12.25). These can be converted
to the right high pass filter by means of (12.27). In this case we find
12.6 Factorizing Coifiet 12 into Lifting Steps 207
t(z) = 22.74z- l .
We have omitted the intermediate calculations, since they involve the prod-
ucts of large Laurent polynomials.
It
1
Hoo(z) HOl(z) _ K 0 1 0 1 Q2n(Z) 1 0
[HlO(Z) Hu(z) ] - [ 0 K- l ] [-K 2t(z) 1] [0 1 ] [Q2n-l(z) 1]
2
= [.7403 0 ] [ 1 0] [1 -.0253z - .0218Z ]
o l
1.351 -12.46z- 1 0 1
0] [1 1.480z- + .3648]
l
1
[9.492z- l - 2.017 1 0 1
l
1 0] [1 -.4866z- + .3418] [ 1 0]
[ .1024 + .4941z 1 0 1 -.3952 1 .
Expanding this equation will show that
which was also valid for the Daubechies 4 factorization (compare (12.31) and
(12.32)). The equations needed for implementing Coiflet 12 is easily derived
from the above matrix equation.
d(l)[n] = S[2n + 1] - 0.3952 S[2n] ,

S(l) [n] = S[2n] - 0.4866 del) [n - 1] + 0.3418 del) [n] ,
d(2)[n] = d(1)[n] + 0.1024 s(l)[n] + 0.4941 s(l)[n + 1] ,
s(2)[n] = s(1)[n] + 1.480 d(2)[n - 1] + 0.3648 d(2)[n] ,
d(3) [n] = d(2) [n] + 9.492 S(2) [n - 1] - 2.017 S(2) [n] ,
S(3) [n] = S(2) [n] - 0.0253 d(3) [n + 1] - 0.0218 d(3) [n + 2] ,
d(4)[n] = d(3)[n] -12.46s(3)[n -1],
s[n] = 0.7403 s(3)[n] ,
d[n] = 1.351 d(4) [n] .
Note that the coefficients in these equations are rounded version of more
accurate coefficients, which are given in Table 12.1. The rounded coefficients
yield a transformed signal which deviates approximately from 0.1% to 2%
from the transformed signal obtained using the more accurate coefficients.
12.6.2 Numerically Unstable Factorization of Coiflet 12
In the previous section we saw the beginning of an unstable factorization of

the Coiflet 12 filter. Of the three possible choices offactor Q2(Z), we continued
Table 12.1. More accurate coefficients for the Coiflet 12 lifting steps
d(ll[n] -0.3952094886 d(3 l [n] 9.491856450
s(ll[n] -0.4865531265 -2.017253240
0.3418203790 s(3 l [n] -0.02528002562
d(2 l [n] 0.1023563847 -0.02182215161
0.4940618204 d(3 l [n] -12.46443692
s(2 l [n] 1.479728699 s[n] 0.7403107249
0.3648016173 d[n] 1. 350784159
with the numerically smallest, that is (12.35). To see just how bad a factor-
ization can get, we will now repeat the factorization, this time proceeding
with (12.33) instead. Moreover, we choose the left most matching of term
each time. The resulting factors are then
ql(Z) = -0.3952,
Tl (z) = 0.08522z- 1 + 0.9775 - 0.09998z + 0.02590z 2 - 0.002108z 3 ,
q2(Z) = 0.4866z- 1
+ 10.11 ,
T2(Z) = -9.516 + 0.9641z - 0.2573z 2 + 0.02059z 3 ,
Q3(Z) = 0.008956z- 1 - 0.1036,
T3(Z) = -0.002370z - 0.0005804z 2 + 0.00002627z 3 ,
Q4(Z) = 4014z- 1 - 1390 ,
T4(Z) = -1.1695z + 0.05710z 3 ,
2
Q5(Z) = 0.002027z- 1 + 0.0005953 ,

T5(Z) = -0.000007725z 3 ,
Q6(Z) = 151381z- 1 - 7392,
a6(z) = -0.000007725z 3 •
This clearly demonstrates that the factorization algorithm is potentially nu-
merically unstable, and one has to carefully choose which factors to proceed
with. Note that although we in the previous section chose the numerically
smallest Q factor each time, there is a priori no guarantee that this will lead
to the most stable solution.
The numerical instability seen here is a well-known aspect of the Euclidean
algorithm. The interested reader can look at the examples in the book [8],
where some solutions to this problem are also discussed.
Exercises 209
Exercises
12.1 Determine the exact number of addition (including all subtractions)

and multiplications needed for applying the lifting version of Daubechies 4 to
a signal of length 2N (disregard any boundary corrections). Compare this to
the number of additions and multiplications needed to apply the filter bank
version of Daubechies 4 to the same signal.
12.2 Repeat the previous exercise for Coiflet 12.
12.3 Implement the six first lifting steps (those originating from the q's) for
both the numerically stable and unstable Coiflet 12 in MATLAB (or some
other language), and apply it to a random signal. Plot the intermediate signals
in the lifting steps to examine how good or bad each of the two factorizations
are.
12.4 The CDF(5,3) equation are given by (see [27])
1
d(1)[n] = S[2n + 1] - 5S[2n] ,
s(1)[n] = S[2n] - 2-(15d(1)[n - 1] + 5d(1)[n]) .

24
1
d(2) [n] = d(1) [n] - 10 (15s(1) [n] + 9s(1) [n + 1]) ,
s(2)[n] = s(1)[n] - 712 (-5d(2)[n -1] - 24d(2)[n] + 5d(2)[n + 1]) ,
s[n] = 3V2s(2)[n] ,
d[n] = V2 d(2)[n] .
6
Construct the corresponding lifting steps in 2 x 2 matrix form.
12.5 The Symlet 10 is an orthogonal filter, and the IR is given by
1 0.01953888273525 6 0.72340769040404
2 -0.02110183402469 7 0.19939753397685
3 -0.17532808990805 8 -0.03913424930231
4 0.01660210576451 9 0.02951949092571
5 0.63397896345679 10 0.02733306834500
The coefficients can also be generated using symlets from Uvi_ Wave in MAT-
LAB. Convert this filter into lifting steps in the equation form.
212 13. Wavelets in Matlab
multiresolution function this order is reversed. For the wavelet packets the
situation is even more complicated. The result of a full wavelet decomposi-
tion is a matrix, where each level is stored in a column. The first column is
the original signal, and the last column the final level permitted by a given
signal. However, in graphical output the original signal is at the top, and
the last level at the bottom. So it is important that the reader consults the
documentation for the various function to find out how the output is stored
in a vector or a matrix.
Due to changes in MATLAB in versions 5.x, some functions in version
3.0 of Uvi_ Wave produce errors and warnings. We suggest how to fix these
problems, see Chap. 14.
13.1 Multiresolution Analysis

In the following examples we use line numbers of the form 1.1, where the
first number refers to the example being considered, and the second number
gives the line numbers within this example.
We start with a signal consisting of a sine sampled 32 times per cycle.
The signal is 500 samples long.
1.1 > 8 = sin([1:500]*2*pi/32)j
It is easy to show that there are 32 samples per cycle.
1.2 > plot (8 (1: 32»
1.3 > plot(8)
To carry out a wavelet analysis of this signal, we must choose four filters.
In Chap. 7 the notation for the analysis filter pair was h o, hI, and for the
synthesis pair go, gl. We changed this notation to h, g for the analysis pair,
and ii, g for the synthesis pair in Chap. 8. Since we cannot easily use the ii
notation in MATLAB, we change the notation once more, this time to h, g,
rh, and rg. This change in notation also corresponds to the notation used
in the Uvi_ Wave documentation. Filters are generated by several different
functions in UvL Wave. The members of the Daubechies family of orthogonal
filters are generated by the function daub. It needs one argument, which is
the length of the filter, or the number of filter taps. We start our experiments
with the Daubechies 8 filters, which are generated by the command
1.4 > [h,g,rh,rg] = daub(8)j
The easiest way to see the result of a multiresolution analysis (MRA) of the
signal 8 is to use the UvL Wave function multires, which produces a series
of graphs similar to those shown in Fig. 4.4 (see Sect. 4.1 for an explanation
of the concept of an MRA). With the help function it is possible to get a
description of what multires does, and what parameters are needed. It takes
the signal, the four filters, and the finally the number of levels we want in
the decomposition. Here we choose to use 4 levels.
~:~I _~IUUlUfl~flflflUI!UlUfl"~""'''''~UUlU''~
~
o'~1 l:~
~~
o.~~ o~L-~~~~
...{).2 ...{).J--~-~--~-~ __J
_~f\/\/\/Y\/VS/\M _::~~
3WvWWWWV\!\M
o 100
~~F-----------\J
200 300 400 500 0 100 200 300 400 500
Fig. 13.1. Graphical output from line 1. 9 (left) and line 1.13 (right)
1.5 > help multires

1.6> y = multires(S,h,rh,g,rg,4);
1. 7 > size(y)
By typing size (y), the size of the matrix y is returned. It has 5 rows and
500 columns. With the function split each of the 5 rows is shown in the
same figure, but with vertical axes each having its own scale. The horizontal
axes are all identical, running from 0 to 500, the sample indices. The result
of using split on y is shown on the left in Fig. 13.1. To simplify the figure
we have removed some redundant labels. Note how most of the energy is
concentrated in the two bottom graphs. We recall that the function split
plots the first level at the bottom and the last level at the top of a figure.
1.8 > help split
1.9 > split(y) (see Fig. 13.1)
A sine with a higher frequency (5 samples per cycle) has a different energy
distribution in the 5 rows of the decomposition.
1.10 > S = sin([1:500]*2*pi/5);
1.11 > Y = mUltires(S,h,rh,g,rg,4);
1.12 > figure
1.13 > split(y) (see Fig. 13.1)
While the first (low frequency) signal with 32 samples per cycle has the main
part of its energy in the two bottom rows, the second (high frequency) signal
has most of its energy in the two top rows, see the right part of Fig. 13.1.
This shows one of the features of a decomposition, namely the ability to split
a signal according to frequency.
Instead of using just a sine, we can try to add a couple of transients, i.e.
a few samples deviating significantly from their neighbor samples.
2.1 > close all
2.2 > S = sin([1:512]/512*2*pi*5);
_~I j !:I
t R
1.5
o'~1
f
-1l.5~==:===~==:==
-1o'~t
l.5 H
: B
--~--"'----~--
-1O'~F:
l.5 ~
_~fS2S2SZ\2SJ--~--~-
100 200 300 400 500 0 100 200 300 400 500
Fig. 13.2. Graphical output from line 2.5 (left) and line 2.7 (right)
2.3 > 8(200) = 2;

2.4 > 8(400) = 2;
2.5 > plot(8) (see Fig. 13.2)
This time the signal is a sine with 5 cycles sampled 512 times. Samples
number 200 and 400 are set to 2. We again perform an MRA on the signal.
2.6 > y = multires(8,h,rh,g,rg,4);
The signal 8 and the MRA of the signal are shown in Fig. 13.2. The bottom
graph in the MRA is hardly distinguishable from the original sine, while
the four other graphs contain most of the energy from the transients. This
property of the MRA allows us to separate 8 into the transients, and the
sine. First we note that since the wavelet transform is linear, it is possible to
reconstruct the original signal 8 simply by adding all the rows in the MRA.
This fact can also be found in the help text to multires. It is easy to check
this graphically.
2.8 > figure
2.9 > plot(sum(y,l)-S)
Typing sum(y, 1) adds all entries along the first dimension in y, which is
equivalent to adding all the rows in y. It can also be done by the more
intuitive, but cumbersome, command
2.10 > plot(y(1,:)+y(2,:)+y(3,:)+y(4,:)+y(5,:)-8)
It is clear that the original signal and the reconstructed signal are almost
identical. They are not completely identical due to the finite precision of the
computations on the computer.
Since the bottom graph resembles the original sine, it should be possible to
reconstruct the transients from the remaining four parts of the decomposition.
The following commands
2.11 > figure

2.12 > plot(y(1, :)+y(2, :)+y(3, :)+y(4, :))
show that the transients are fairly well separated from the sine in the decom-
position. We can also plot the bottom graph in the same figure for comparison.
2.13 > hold on
2.14 > plot(y(5, :))
Until now we have only been looking signals reconstructed from wavelet co-
efficients (since the signals in the plots generated with multires followed by
split are not the wavelet coefficients themselves, but reconstructions of dif-
ferent parts of the coefficients). If we want to look at the wavelet coefficients
directly, the function wt can be used. It implements exactly what is depicted
in both Fig. 3.7 and Fig. 8.2(a). Therefore by typing
2.15 > yt = wt(S,h,g,4);
the signal S is subjected to a 4 scale DWT (based on the Daubechies 8
filters). Although the output of wt is a single vector, it actually contains 5
vectors. How these are located in yt are described in the help text to wt. It is
also shown in Fig. 4.2. The wavelet coefficients can easily be shown with the
isplit function. Note that this function plots the first level at the top, in
contrast to split, which plots the output of multires in the opposite order.
2.16 > figure
2.17 > isplit(yt,4,",'r.')
Because we now have the wavelet coefficients available, we can experiment
with changes to the coefficients, for example setting some of them equal
to zero. After changing the coefficients, the function i wt is used to do an
inverse transform. Suppose we want to see what happens, if the fourth scale
coefficients are set to zero (the fourth scale coefficient vector is denoted by
d j - 4 in Fig. 3.7). Then we use the commands
2.18 > yt(33:64) = zeros(1,32);
2.19 > yr = iwt(yt,rh,rg,4);
With subplot two or more graphs can be inserted into the same figure (the
same window), making it easier to compare them. Here the first graph shows
the original signal in blue and the reconstructed signal (from the modified
coefficients) in red.
2.20 > figure
2.21 > subplot (211)
2.22 > plot(S,'b')
2.23 > hold on
2.24 > plot(yr,'r')
As the second graph the difference between the two signals in the first subplot
is shown.
0.9
:1
-2
nlllll~'''';1~ t.llllllIl!
J
O.B
0.7
'1~111.I;II~II! flU. I
~O.6
~0.5
:E.:~ I
:RF
LL0.4 -2
0.3
0.2 -2 : I
0.1
100 200 300 400 500 600 700 BOO

:f\F
-2
0
:
500 '000 '500
I
2000
Time Samples
Fig. 13.3. Graphical output from line 3.3 (left) and line 3.7 (right)
2.25 > subplot (212)

2.26 > plot (S-yr)
13.2 Frequency Properties of the Wavelet Transform
Before going through this section the reader should be familiar with the
concept of a time-frequency plane from Chap. 9, and the function specgram
from the signal processing toolbox for MATLAB.
We have a number of times stressed that h and g are low and high pass
filters, respectively, Le. they can separate (more or less) the low and high
frequencies in a signal. To see what effect this property has on the wavelet
transform, we will now use multires on a signal containing all frequencies
from 0 to half the sampling frequency.
3.1 > close all
3.2 > S = sin([1:2000] .-2/1300);
With a Fourier spectrogram we immediately see that the signal, a so-called
chirp, actually does contain all frequencies.
3.3 > specgram(S) (see Fig. 13.3)
The filter is chosen to be a 16 tap filter, Le. a filter of length 16, from the
Daubechies family, in order to have a reasonable frequency localization.
3.4 > [h,g,rh,rg]=daub(16);
3.5 > Y = multires(S,h,rh,g,rg,4);
3.6 > figure
The left part of Fig. 13.3 shows that the energy is distributed along a line
in the time-frequency plane. This agrees with the linear dependence between
time and frequency in a chirp, which here is obtained by sampling the function
sin(t 2 /1300). The MRA graphs on the right in Fig. 13.3 do not show a linear
dependence. Approximately half of the energy is located in the top graph, a
quarter in the second graph, and so on. Each graph seems to contain about
half of the energy of the one above. This partitioning of the energy comes
from the repeated use of the filters on the low pass part in each step of the
DWT. As a consequence, the relation between time and frequency becomes
logarithmic, not linear.
As another example of the ability of the DWT to separate frequencies, we
look at a signal mixed from four signals, each containing only one frequency,
and at different times. To experiment with another kind of filter, we choose
to use a biorthogonal filter, the CDF(3,15) filter, which is obtained from
4.1 > [h,g,rh,rg]=wspline(3,15);
As explained in Chap. 7, biorthogonal filters do not necessarily have an equal
number of low and high pass filter taps.
4.2 > h
4.3 > g
For our experiment we want four different signals (different frequency content
at different times)
4.4 > 51 = [sin([1:1000]/1000*2*pi*20) zeros(1,1000)];
4.5 > 82 = [zeros(1,1000) sin([1:1000]/1000*2*pi*90)];
4.6 > s3 [zeros(1,1200) sin([1:400]/400*2*pi*2)
zeros(1,400)] ;
4.7 > s4 = [zeros(1,250) ...
sin([1:250]/250*2*pi*125+pi/2) zeros(1,250) ...
sin([1:500]/500*2*pi*250+pi/2) zeros(1,750)];
where s4 contains a high frequency, s3 contains a low frequency, while sl
and 52 have frequencies in between. This can be verified by plotting the four
signals. The plot also shows how the energy is distributed in time.
4.8 > 5ubplot(511)
4.9 > plot(51)
4.10 > 5ubplot(512)
4.11 > plot (s2)
4.12 > 5ubplot(513)
4.13 > plot (s3)
4.14 > 5ubplot(514)
4.15 > plot (54) (see Fig. 13.4)
We combine the four signals by addition.
4.16 > S = 51+s2+53+54;
_~f\l\l\l\l\l\l\l\l\l\l\l\l\l\l\l\l\l\l\l\;~------------------------------
1 r----r----r---....---,--~
01-------------
-1 L - _ - - L_ _- l -_ _L - _ - - - ' - - _ - - - - - l
_~I : : : : : :\/\;,...---:-
_~I :_: _ ' - - - -
o.~
- 0 . 5 .
o 200 400 600 800 1000 1200 1400 1600 1800 2000
Fig. 13.4. Graphical output from lines 4.8 to 4.15, and from line 4.19
To see how the DWT reacts to noise, we will test the MRA on this signal
S both with, and without, noise. Normal distributed random numbers are a
common type of simulated noise.
4.17 > s5 = randn(1,2000)/8;
The probability that randn generates numbers with absolute value larger
than 4 is very small. Division by four leads to a signal s5 with almost all its
values between -0.5 and 0.5.
4.18 > subplot (515)
4.19 > plot (s5) (see Fig. 13.4)
Thus we have the following two signals for our experiments.
4.20 > figure
4.21 > subplot (211)
4.22 > plot(S)
4.23 > subplot (212)
4.24 > plot (S+s5)
Let us first look at the MRA for the signal S without noise.
4.25 > ym1 = multires(S,h,rh,g,rg,5);
4.26 > figure
4.27 > split (ym1)
When interpreting the graphs, it is essential to notice that they have different
vertical scales. If we want the same scale on all graphs, the function set (gca,
... ) is useful (it can be used to alter graphs in many other ways, too).
4.28 > for n=1: 6 (see Fig. 13.5)
subplot(6,1,n)
Fig. 13.5. Graphical output from line 4.28 and line 4.40
set (gca, 'YLim', [-2 2])

end
Now all the graphs are scaled to the interval [-2; 2].
Since the 'up'-key on the keyboard can be used to browse through previous
lines, it would be nice to have the for loop on one line (then the for loop
can easily be reused).
4.29 > for n=1:6, subplot(6,1,n), set(gca, 'YLim', [-2 2]), end
For clarity, subsequent loops will not be written in one line, though.
Now we can see one of the major advantages of the DWT: Not only
have the four frequencies been separated, but the division in time is also
reconstructed (compare Fig. 13.4 and Fig. 13.5). IT we try to separate the
frequencies using the short time Fourier transform (which forms the basis
for the Fourier spectrogram), we have to choose between good frequency
localization
4.30 > figure
4.31 > subplot (211)
4.32 > specgram(8, 2048, 1, 256)
4.33 > caxis([-50 10])
and good time localization.
4.34 > subplot (212)
4.35 > specgram(8, 2048, 1, 32)
4.36 > caxis([-50 10])
The MRA for 8+s5, Le. 8 with noise added, is roughly the same as the MRA
for 8.
4.37 > ym2 = multires(8+s5,h,rh,g,rg,5);
4.38 > figure
4.39 > split (ym2)
4.40 > for n=1:6 (see Fig. 13.5)

subplot(6,1,n)
set (gca, , YLim', [-2 2])
end
13.3 Wavelet Packets Used for Denoising
As an example of a typical wavelet packets application, we will now study a

very simple approach to denoising. This example demonstrates advantages,
as well as disadvantages, of applying a wavelet based algorithm. Usually the
noise-free signal is not available a priori, but to evaluate the effectiveness of
our technique, we use a synthetic signal, to which we add a known noise.
5.1 > S = sin([1:4096] .~2/5200);
5.2 > specgram(S,1024,8192,256,192)
First a remark on the use of colors. By typing
5.3 > colorbar
another axis appears to the right in the figure. This axis shows how the
colors are distributed with respect to the numerical values in the spectrogram.
Sometimes the color scale interval is inappropriate, and it can be necessary
to change it. In the present case the interval seems to be [-150; 30], and it
can be changed using caxis in the following way.
5.4 > caxis ( [-35 35])
5.5 > colorbar
The caxis command changes only the colors in the main plot. So if the color-
bar should correspond to the spectrogram, it has to be updated by reissuing
the command colorbar.
The chirp is chosen as the synthetic signal here, since it covers a whole
range of frequencies, and any noise in such a signal is easily seen and heard.
The latter being possible only if the computer is equipped with a sound card.
The signals can be played using sound, which takes the playback frequency
as second argument.
5.6 > sound(S,8192)
Being a synthetic signal, S does not have a 'real' sampling frequency. In this
case we just chose 8 kHz. The input signal to sound must have value in [-1; 1].
Values outside this interval are clipped, so any signal passed to sound should
be scaled to fit this interval. Note that our synthetic signal was created using
the sine, hence no scaling is necessary.
As noise we choose 4096 randomly distributed numbers.
5.7 > noise = randn(1,4096)/10;

5.8 > figure
5.9 > specgram(S+noise, 1024,8192,256, 192)
5.10 > caxis([-3535])
5.11 > figure
5.12 > specgram(noise,1024,8192,256,192)
5.13 > caxis([-3535])
5.14 > sound(S+noise,8192)
The intensity of the noise can be varied by choosing a number different from
10 in the division in line 5.7. As our filter we can choose for example a
member of the CDF family (often called spline wavelets)
5.15 > [h,g,rh,rg]=wspline(3,9);
or near-symmetric Daubechies wavelets, from the symlet family.
5.16 > [h,g,rh,rg]=symlets(30);
Our experiments here use the symlets. Since symlet filters are orthogonal,
the transform used here preserves energy.
To make a wavelet packet decomposition, the function wpk is used. Since
it operates by repeatedly applying wt, and since it always does a full de-
composition, it is relatively slow. This does not matter in our experiments
here. But if one needs to be concerned with speed, then one can use de-
composition functions implemented in C or FORTRAN, or, as we will see
later, one can specify which basis to find. Some results on implementations
are given in Chap. 11, and information on available libraries offunctions for
wavelet analysis is given in Chap. 14. The function wpk is sufficient for our
experiments.
5.17 > Y = wpk(S,h,g,O);
The fourth argument determines the ordering of the elements on each level.
The options are filter bank ordering (0) and natural frequency ordering (1).
The meaning of these terms is described in Sect. 9.3. In this case the ordering
does not matter, since we are only concerned with the amplitude of the
wavelet coefficients (in line 5.29 small coefficients are set equal to zero).
With wpk we get a decomposition of llog2(N)J + 1 levels, where N is the
length of the signal. So wpk(S,h,g,O) returns a 4096 x 13 matrix, since
llog2 (4096)J + 1 = 13, and the first column of this matrix contains the original
signal. Note that the ordering of the graphs in a plot like Fig. 13.6 corresponds
to the transpose of the y obtained from wpk. The ordering of elements in y
is precisely the one depicted in Fig. 8.2(b).
At the bottom level (i.e. in the last column of y) the elements are quite
small, only one coefficient wide. Since they are obtained by filtering the el-
ements in the level above (where each element is two coefficients wide) only
Fig. 13.6. Graphical output from line 5.19 (with symlets 30 from line 5.16) and
line 5.32
2 of the 30 filter taps are used. Hence 28 of the filters taps do not have any
influence on the bottom level, and one should therefore be careful when inter-
preting the lowest level. This also applies (in lesser degree) to the few levels
above it. However, it is still possible to reconstruct the original signal from
these elements, so the lowest levels does give a representation of the original
signal, although it might not be useful in some applications.
A plot of all 13 levels in the same figure would be rather cluttered. We limit
the plot to the first 5 levels here. Note that wpk produces a decomposition in
a matrix, where each column corresponds to a level. With set (gca, ... ) the
time axis is set to go from 0 to 4096, and tick marks are applied at 0, 1024,
2048,3072, and 4096. On the fifth level there are 16 elements (see Fig. 13.6),
four between each pair of tick marks.
5.18 > figure
5.19 > for n=l: 5 (see Fig. 13.6)
subplot(5,1,n)
plot (y(: ,n))
set(gca, 'XLim', [0 4096], 'XTick', [0:1024:4096])
end
Visually, the decomposition of the noisy signal does not deviate much from
the first decomposition.
5.20 > yn = wpk(S+noise,h,g,O);
5.21 > figure
5.22 > for n=1:5
sUbplot(5,1,n)
plot (yn(: ,n))
set(gca, 'XLim', [0 4096], 'XTick', [0:1024:4096])
end
There is nonetheless an important difference (on all levels but for the sake
of simplicity we focus on the fifth level): While most of the fifth level in the
decomposition of the signal without noise consists of intervals with coefficients
very close to zero, the fifth level in the decomposition of the noisy signal is
filled with small coefficients, which originate from the noise we added to the
signal.
This gives rise to an important observation. The energy of the signal is
collected in fewer coefficients as we go down in the levels. Energy is conserved,
since we have chosen orthogonal filters. Consequently, these coefficients must
become larger. The noise, however, since it is random, must stay evenly dis-
tributed over all levels. Thus due to the energy preservation, most of the
coefficients coming from the noise must be small. The growth of the coeffi-
cients is clearly visible in Fig. 13.6 (note the changed scaling on the vertical
axes). It is therefore reasonable to hope that the signal can be denoised by
setting the small coefficients equal to zero. The property of concentrating
desired signal information without concentrating the noise is important in
many applications.
We have already looked at level 5, so let us look at level 7, for example.
We plot level 7 from the two decompositions in the same figure.
5.23 > figure
5.24 > plot(y(:,7))
5.25 > hold on
5.26 > plot(yn(:,7),'r')
Since there are 4096 points in each signal, the graphs do not clearly show the
differences. By typing
5.27 > zoom
(or by choosing zoom on the figure window menu) we can enlarge a chosen
area of the graph. Mark an area with the mouse, and MATLAB zooms to
this area. By examining different parts of the signals, one gets the impression
that the difference between the two signals is just the noise. Hence after six
transforms the noise remains as noise. This examination also reveals that
setting all coefficients below a certain threshold (between 0.5 and 1) equal to
zero will make the two signal much more alike. To try to determine the best
threshold, we look at the coefficients on the seventh level order according to
absolute value.
5.28 > figure; plot(sort(abs(yn(: ,7))))
The choice of threshold value is not obvious. We have chosen 1, but judging
from the sorted coefficients, this might be a bit to high. To change all values
in yn(: ,7) with absolute value less than or equal to 1 to zero we type
5.29 > yc = yn(:,7) .* (abs(yn(:,7)) > 1);
The part abs(yn(:, 7) »1 returns a vector containing O's and l's. A 0, when-
ever the corresponding value is below or equal to 1, and 1 otherwise. Multi-
plying coefficient by coefficient with yn(: ,7) leaves all coefficients above 1
unchanged, while the rest is changed to zero. Note that . * is coefficient-wise
multiplication and * is matrix multiplication.
Now we want to used the modified seventh level to reconstruct a hopefully
denoised signal. This is done using iwpk. Since this is an inverse transform,
we need to use the synthesis filters (rh and rg).
5.30 > yr = iwpk(yc,rh,rg,0,6*ones(1,64));
The fifth argument passed to i wpk is the basis to be used in the reconstruc-
tion. As we saw in Chap. 8, many possibilities exist for the choice of a basis
(or a representation), when we use a full wavelet packet decomposition. Here
we have chosen the representation given by all elements on the seventh level.
This basis is in Uvi_ Wave represented as a vector of length 64 consisting of
only 6's (given as 6*ones(1,64) in MATLAB), where 6 is the level (counting
from zero) and 64 the number of elements on the seventh level. The basis rep-
resentation notation is described by Fig. 11.2 on page 182, and in Uvi- Wave
by typing basis at the prompt. The reconstruction in line 5.30 is performed
exactly as shown in Fig. 8.3 on page 90.
Now yr contains a reconstructed, and hopefully denoised, signal. Let us
look at the spectrogram of this signal.
5.31 > figure
5.32 > specgram(yr,1024,8192,256,192) (see Fig. 13.6)
5.33 > caxis([-35 35])
5.34 > sound(yr, 8192)
We can visualize our denoising success by looking at the difference between
the original signal and the denoised signal. Ideally, we should only see noise.
Often a (small) part of the signal is also visible in the difference. This means
that we also have removed a (small) part of the signal.
5.35 > figure
5.36 > specgram«S+noise)-yr',1024,8192,256,192)
5.37 > caxis([-35 35])
5.38 > sound(yr'-(S+noise), 8192)
Finally, we can inspect the difference between the original, clean, signal, and
the denoised signal. This is only possible, because the signal in our case is
synthetic; the point of denoising a signal is usually that the clean signal is
not available. Having the clean signal available gives us a chance to examine
the efficiency of our denoising algorithm.
5.39 > figure
5.40 > specgram(yr'-S,1024,8192,256,192)
5.41 > caxis([-35 35])
5.42 > sound(yr'-S), 8192)
To experiment with a different threshold value, some of the calculations must

be performed again. By collecting these in a loop, it is possible to try several
threshold values without having to retype the commands. Note that this loop
only works if yn is already calculated.
5.43 > close all
5.44 > while 1
Bound = input('Bound (return to quit): ');
if isempty(Bound) break; end
yc = yn(:,7) .* (abs(yn(: ,7)) > Bound);
yr = iwpk(yc,rh,rg,O,6*ones(1,64))j
figure (1)
elf
specgram(yr,1024,8192,256,192)
caxis([-3535])
sound(yr,8192)
end
13.4 Best Basis Algorithm

Wavelet packet decompositions are often used together with the best basis
algorithm to search for a best basis, relative to a given cost function. These
topics were discussed in detail in Chap. 8.
In Uvi_ Wave there are several functions that search through the full
wavelet packet decomposition. Only one of them, pruneadd, implements the
best basis search algorithm, as described in Chap. 8 on page 94. We will use
this function for our next example.
We start by defining a signal. A rather short one is chosen this time,
since we are interested in basis representations, and since the number of
representations grows rapidly with the length of the signal, when we perform
a full wavelet packet decomposition.
6.1 > S = sin([1:32].~2/30);
6.2 > S = S / norm(S)j
6.3 > plot(S)
The signal is normalized to have norm equal to one, because we want to use
Shannon's entropy cost function. Although it is not necessary to do this (the
resulting basis would be the same without normalization), it ensures positive
cost values.
6.4 > [h,g,rh,rg] = daub(6);
6.5 > [basis,v,total] = pruneadd(S,h,g,'shanent')j
The function pruneadd does several things. As input is takes the signal, the
two analysis filters, and a cost function. It first performs a full wavelet packet
decomposition of the given signal, to the number of levels permitted by the

length of the signal. In our case we have a signal of length 32, so there will
be 6 levels in the decomposition (see Table 8.2). The output is the selected
basis, contained in
6.6 > basis
the representation of the signal in this basis, Le. the coefficients in v,
6.7 > plot (v)
and the total cost for that particular representation.
6.8 > total
The cost function shanent calculates Shannon's entropy for the output vec-
tor v (see Sect. 8.3.2).
6.9 > shanent(v)
This value is the One returned by pruneadd as total above.
Since there is no function in Uvi_ Wave, which performs only the best
basis search, we will nOw show that this search algorithm is very easy to
implement in MATLAB. Note how each step in the algorithm on page 94 can
be converted to MATLAB code. First we need a full decomposition of the
signal.
6.10 > y = wpk(S,h,g,O);
6.11 > size(y)
The first step in the algorithm is to calculate the cost value for each element
in the decomposition. Since there are 6 levels, each of length 32, there is a
total of 26 - 1 = 63 elements. We start by constructing a vector containing
the computed cost values.
6.12 > CostValue = [ ];
6.13 > for j=0:5
for k=0:2-j-1
Element = y(1+k*2-(5-j):(k+1)*2-(5-j),j+1);
CostValue = [CostValue shanent(Element)];
end
end
Note that this construction of the vector CostValue is highly inefficient, and
it is used here only for the sake of simplicity. Now CostValue contains the
63 cost values from our decomposition. Save the cost values for later use (if
one omits the ; at the end, the values are shown On the display).
6.14 > Old_CostValue = CostValue
Since we know the cost value for the best basis for our example above, we
can easily check, if the best representation happens to be the original signal.
6.15 > CostValue(l)-total

This is not the case, since the original representation has a higher cost than
the value found by pruneadd above. The next step in the algorithm is to mark
all the elements at the bottom level. We now need a notation for a basis. The
one used in UvL Wave is efficient, but not user friendly. We continue to use
the notation implicitly given by the definition of the vector CostValue. Here
we have numbered elements consecutively, from the top to the bottom in each
column, and then going through the rows from left to right. The last column
contains level 6, which has 32 elements. This numbering is performed in the
two for loops starting on the line 6.13. It is also depicted to the right in
Fig. 11.2 on page 182. In the basis vector we let 1 mean a chosen (marked)
element and 0 marks elements not chosen. Marking all the elements at the
bottom level is then performed by
6.16 > b = [zeros(1,31) ones(1,32)];
With 6 levels, there is a total of 63 elements, distributed with 31 elements
at the first 5 levels and 32 elements at the bottom level. In the next steps
the actual bottom-up search is carried out. Note that for any element with
index equal to Index the two elements just below it have indices 2*Index
and 2*Index+1.
6.17 > Index = 31;
6.18 > for j = 4:-1:0
for k = 0:2~j-l
tmp = CostValue(2*Index)+CostValue(2*Index+l);
if CostValue(Index) < tmp
b(Index) = 1;
else
CostValue(Index) tmp;
end
Index = Index - 1;
end
end
Note that here we do not remove the marks on elements below the currently
chosen and marked ones, in contrast to the step 4(a) in the algorithm. We
will do this step later. Before proceeding with the basis, let us take a look at
the cost values. As can be seen in the algorithm, step 4(b), the numbers in
CostValue might change.
6.19 > Old_CostValue - CostValue
Most of the numbers in the vector CostValue have changed as part of the
process of finding the best basis. The last step in the algorithm (on page 94)
states that the first entry in CostValue is the total cost value for the basis
just found, which is the best basis.
6.20 > CostValue(1)-total

If everything has gone as expected, you should get zero. Only one thing
remains to be done. Since one step was neglected in the best basis search
above, there are 'too many' 1's in b, because the 1's that should have been
removed by step 4(a), are still in the vector. These can be removed by
6.21 > for j=1:31
b(2*j) = b(2*j) + 2*b(j)j
b(2*j+1) = b(2*j+1) + 2*b(j)j
end
6.22 > b = (b == 1)j
The for loop sets all the unwanted 1's to values higher than 1, and the last
line sets all values but 1 to zero.
The two vectors basis and b now represent the same basis, but in two
different ways. While basis is a 'left to right' representation (see Sect. 11.8
or the script basis in Uvi_ Wave) the vector b is a 'top-down' representation.
Both basis representations are shown in Fig. 13.7.
3 3 4 4 3 2 2
Fig. 13.7. The basis representations basis and b are quite different, and yet they
represent the same basis
It is not difficult to reconstruct the original signal from v, which is the rep-
resentation of the signal in the basis basis. The process is exactly the one
described by Fig. 8.3.
6.23 > 82 = iwpk(v,rh,rg,O,basis);
6.24 > plot (82-8)
The best basis can be presented using the two commands tree and tfplot.
They both take a basis as input arguments. Both types of representations
have been demonstrated in previous chapters. The former relates to the
type of visualization presented in Sect. 3.5, although only wavelet decom-
position is discussed there. The tfplot command shows the corresponding
time-frequency plane without any coloring, like the time-frequency planes in
Fig. 9.11.
6.25 > tree(basis)
6.26 > figure

6.27 > tfplot(basis)
Place the two figures next to each other on the screen, and notice how the
two representations are much alike, although they might seem different.
The representation of a signal in the best basis (in this case v) is in many
application the very reason for using wavelet analysis. Subject to the given
cost function, it is the best representation of the signal, and with the right
choice of cost functions, we have a potentially very good representation. For
instance one with only very few large sample.
In this case we have the representation with the smallest Shannon entropy
(since we used the argument' shanent' in line 6.5). This cost functions finds
the representation with the lowest entropy. Since entropy, as explained in
Sect. 8.3.2, measures concentration of energy, we have found the representa-
tion v with the highest concentration. This in turn means many coefficients
in v must be small.
6.28 > figure
6.29 > plot(sort(abs(v)))
Compare this to the size of the coefficients of the original signal.
6.30 > hold on
6.31 > plot(sort(abs(8)),'r')
We have previously experimented with altering the transformed signal in
an attempt to denoise a signal (for instance in line 5.29), but so far we
have only considered altering the signal on a single level. With a best basis
representation we are more in control of what happens to the signal, since we
alter a predetermined best representation instead of a more or less arbitrary
level representation. This is not so easy to see with just 32 coefficients, so we
take a longer signal. We also make some noise.
6.32 > 8 = sin([1:512].~2/512);
6.33 > noise = randn(1,512)/2;
In all the previous examples we have used randomly distributed noise only.
This time we will use colored noise. It is made by band pass filtering the
noise, and we use the following command to make this band pass filter. Note
that butter comes from the signal processing toolbox.
6.34 > [B,A] = butter(2,[0.2 0.6]);
It lets frequencies from 0.2 to 0.6 times half the sampling frequency through.
The filter is applied to the noise and added to the signal as follows,
6.35 > 82 = 8 + filter(B,A,noise);
Now we make both a wavelet packet decomposition and a best basis search.
6.36 > [h,g,rh,rg] = daub(10);

6.37 > y = wpk(S2,h,g,0);
6.38 > [basis,v]=pruneadd(S2,h,g,'shanent');
We see that the best basis is not just a single level.
6.39 > tree (basis)
We will now display the sorted coefficients of the original signal, of all the
levels in the WP decomposition, and of the best basis representation of the
signal.
6.40 > plot(sort(abs(S2)),'b')
6.41 > hold on
6.42 > plot(sort(abs(y(: ,2:end))),'k')
6.43 > plot(sort(abs(v)),'r')
As stated above the best basis representation is more concentrated than any
single level (although in this particular case not so much).
13.5 Some Commands in UvLWave
To encourage the reader to do further experiments with MATLAB and

Uvi_ Wave, we wrap up this chapter with a list of descriptions of useful com-
mands in Uvi_ Wave, including those presented previously in this chapter.
Whenever the theoretical background of a command can be found in the
book, we also provide a reference to that particular chapter or section.
This collection is by no means exhaustive or complete, since Uvi_ Wave
contains more than 70 different functions. Detailed help can be found in the
Uvi_ Wave manual and with the MATLAB help command.
Filters. To use a transform filters have to be generated. The Daubechies
filters of (even) length Mare generated by the command
[h,g,rh,rg]=daub(M)
Here h is the low pass analysis filter, h the high pass analysis filter, and rh,
rg the corresponding synthesis filters. The symlets of length Mare generated
with the command symlets (M). Both of these families are orthogonal. The
biorthogonal filters in the CDF(M,N) family are generated with the command
wspline(M,N). Note that Uvi_Wave has no function for generating Coiflets.
These filters can be obtained from tables in the literature, see for example [5],
or from other toolboxes, for example those mentioned in Sect. 14.2.
lD wavelet transforms. The direct and inverse transforms are obtained with
the commands
wt(S,h,g,k)
iwt(S,rh,rg,k)
13.5 Some Commands in UvLWave 231
Here S is the signal vector to be transformed or inverted. See the documenta-

tion for the ordering of the entries in the direct transform vector. The number
of scales (see Sect. 3.5) to be used is specified with k. These functions use
the filter bank approach to the DWT, see Chap. 7. The transforms use peri-
odization to deal with the boundary problem, see Sect. 10.4. Alignment is by
default performed, using the first absolute maximum method, see Sect. 9.4.3.
The alignment method can be changed using wtmethod.
2D wavelet transforms. The separable 2D transforms are implemented in the
functions
wt2d(S,h,g,k)
iwt2d(S,rh,rg,k)
The principles are described in Chap. 6. There is a script format2d, which
explains the output format in detail. Run it before using these commands.
Wavelet packet transforms. The direct and inverse wavelet packet transforms
in dimensions one and two are implemented in
wpk(S,h,g,O,B)
iwpk(S,rh,rg,O,B)
wpk2d(S,h,g,O,B)
iwpk2d(S,rh,rg,O,B)
°
The fourth variable (zero) specifies filter bank ordering of the frequencies.
Change to the value 1 to get the natural frequency order. See Sect. 9.3.
The basis to be used is specified in the parameter B. The basis is described
according to the UvL Wave scheme, as explained in Sect. 11.8.1. Note that
it is different from the one adopted in our implementations. Run the script
basis for explanations and examples.
Best basis algorithm. The best basis algorithm from Sect. 8.2.2 is imple-
mented for an additive cost function C (see Sect. 8.3) with the command
[basis,y,total]=pruneadd(S,h,g,C,P)
Here P is an optional parameter value that may be needed in the cost function,
for example the value of p in the eP-norm cost function. This cost function
and the Shannon entropy are implemented in the functions
lpenerg(S,P)
shanent(S)
The function pruneadd returns the selected basis in basis, the transformed
signal in this basis in y, and the total cost of this representation in total.
Two additional functions can be of use in interpreting the results obtained
using pruneadd.
tfplot(B)
tree (B)
The first one displays the tiling of the time-frequency plane associated with
basis B, as in Fig. 9.11. The second one displays the basis as a tree graph.
Further explanations are given in the script basis.
Multiresolution analysis. The concept of a multiresolution analysis was ex-
plained in Chap. 4. The relevant commands are
y=multires(S,h,rh,g,rg,k)
split(y)
y=mres2d(S,h,rh,g,rg,k,T)
Here k is the number of scales. The result for a length N signal is a (k + 1) x N
matrix. For k less than 10 the result can be displayed using split (y). The
two-dimensional version works a little differently. The output is selected with
the parameter T. If the value is zero, then both the horizontal and vertical
part of the separable transform used the low pass filter. Other values give
the other possibilities, see the help pages. Note the ordering of the filters in
these functions.
Exercises
See the exercises in Chap. 4, Chap. 5, and Chap. 6.

14. Applications and Outlook
In this chapter we give some information on applications of wavelet based

transforms. We only give brief descriptions and references, since each topic
requires a different background of the reader. We also give some directions
for further study of the vast wavelet literature.
The World Wide Web is a good source of information on wavelets and
their applications. We recommend all readers seriously interested in wavelets
to subscribe to the Wavelet Digest. Information on how to subscribe can be
found at the URL 1. The reference of the form URL 1 is to the first entry in
the list at the end of this chapter.
14.1 Applications
Wavelets have been applied to a large number of problems, ranging from pure
mathematics to signal processing, data compression, computer graphics, and
so on. We will mention a few of them and give some pointers to the literature.
14.1.1 Data Compression
Data compression is a large area, with many different techniques being used.
Early in the development of wavelet theory the methods were applied to data
compression. One of the first successes was the development of the FBI fin-
gerprint compression algorithm, referred to as Wavelet/Scalar Quantization.
Further information on this particular topic can be found at the URL 2 and
the URL 3.
Let us briefly describe the principles. There are three steps as shown on
Fig. 14.1. The given signal s is transformed using a linear transformation T.
This could be a wavelet packet transform, which is invertible (has the perfect
reconstruction property). The next step is the lossy one. A quantization is
performed. The floating point values produced by the wavelet transform are
classified according to some scheme. For example an interval [Ymin, Ymax] of
values is selected as relevant. Transform values above and below are assigned
to the chosen maximum and minimum values. The interval is divided into
N subintervals of equal length (or according to some other scheme), and the

234 14. Applications and Outlook
interval numbers are then the quantized values. (Note that the thresholding
used in Chap. 4 can be viewed as a particular quantization method.) Finally
these N values are coded in order to get efficient transmission or storage of the
quantized signal. The coding is usually one of the entropy coding methods,
for example Huffman coding.
Fig. 14.1. Linear data compression scheme. The signal s is first transformed using
a linear transform T. It is then quantized by some scheme Q. Finally the quantized
signal is coded using entropy coding, to get the result Sc
The compression scheme described here is called an open-loop scheme, since

there is no feedback mechanism built into it. To get efficient compression the
three components T, Q, and C must be optimized as a system. One way of
doing this is by a feedback mechanism, for example leading to changes in the
quantization, if the coding step leads to a poor result, by some measure. The
overall design goal is often that the compressed and then reconstructed signal
should be sufficiently close to the original, by some measure. For example,
the average person listening to music being played back from a compressed
version should not notice any compression effects, or the compression effects
should be acceptable, by some subjective measure.
The type of compression used, and the level of compression that is ac-
ceptable, depends very much of the type of application one considers. For
example, in transmission of speech the first requirement is intelligibility. The
next step is that the speaker should be recognizable, etc. For music the re-
quirements are different, and more difficult to satisfy. One often uses a model
based approach. Statistical methods are also often used.
In image compression the issues are even more complex, and the prob-
lems are compounded by the fact that it is difficult to find good models for
images. A recent development is the use of separable 2D wavelet transforms,
defined using lifting, in the new image compression standard JPEG2000. The
previous standard JPEG was based on block discrete cosine transforms. See
VRL 4 for further information.
Continuing with video compression and multimedia applications, we get
to the point where the results are current research. We should mention that
parts of the MPEG-4 standard will use wavelet based methods. Try looking
at the VRL 5, or search the Web for sites with information on MPEG-4.
Let us also note that the data compression problem has an interesting
mathematical aspect, in the problem of characterizing classes of signals, in
particular images, and based on the classification, trying to devise efficient
compression schemes for classes of signals.
14.2 Outlook 235
It is clear from this very short description that there are many aspects beyond
the wavelet transform step in getting a good compression scheme. We refer
to the books [16, 28] for further results, related to wavelet based methods.
14.1.2 Signal Analysis and Processing
One of the applications shown in Chap. 4 was to signal denoising, see in

particular Fig. 4.11. Our approach there was to select a threshold by visual
inspection of a number of trials. See also the examples in Sect. 13.3. To be
used in practice one needs a theory to determine how to select a threshold,
or some other method. Several results exist in this area. We refer to the book
[16] and the references therein. Again, this is an active area of research, and
many further results are to be expected.
Another area of application is to feature identification in signals. For one-
dimensional signals (for example seismic signals) there is considerable work
on the identification of features in such signals. In particular, singularities can
be located very precisely, as shown in simple examples in Chap. 4. The edges
in a picture can also be detected, as shown in Fig. 6.9, but this is actually a
rather complicated issue. Identification of other structures in images is again
an area of current research.
We should also mention that for one-dimensional signals the time-frequen-
cy planes discussed in Chap. 9 can be a good starting point for the analysis of
a class of signals. Again, much more can be done than was described in that
chapter. In particular, the analysis can also be performed using transforms
based on the discrete cosine transform.
14.1.3 Other Applications
Applications to many other areas of science exist. For some examples from
applied mathematics, see the collection of articles [14], and also the book [18].
For applications to the solution of partial differential equations, see [9]. The
collection of articles [1] contains information on applications in physics. The
books [21] and [29] discusses applications in statistics. Applications to me-
teorology are explained at the URL 6. Many other applications could be
mentioned. We finish by mentioning an application involving control theory,
in which one of the authors is involved, see the URL 7.
14.2 Outlook
First a word of warning. Application of wavelets involves numerical com-

putations. There are many issues in computational aspects that have not
been touched upon in this book. On thing should be mentioned, namely that
application of the lifting technique can lead to DWTs that are numerically
236 14. Applications and Outlook
unstable. Our suggestion is at least initially to rely on the well known or-
thogonal or biorthogonal families, where it is known that these problems do
not occur. If the need arises to construct new transforms using lifting, one
should be aware of the possibility of numerical instability.
The reader having read this far and wanting to learn more about wavelets
is faced with the problem of choice of direction. One can decide to go into
the mathematical aspects, or one can learn more about one of the many
applications, some of which were mentioned above. There is a large number of
books that one can read. But one should be aware that each book mentioned
below has specific prerequisites, which vary widely.
Concerning the mathematical aspects, then the book by 1. Daubechies [5]
might be a good starting point. Another book dealing mainly with math-
ematical aspects is the one by E. Hernandez and G. Weiss [12]. For those
with the necessary mathematical foundations the book by Y. Meyer [17], and
the one by Y. Meyer and R. Coifman [19], together provide a large amount
of information on mathematical aspects of wavelet theory. There are many
other books dealing with mathematical aspects. We must refer the reader
to for example Mathematical Review, were many of these books have been
reviewed.
A book which covers both the mathematical aspects and the applications
is the one by S. Mallat [16] that we have mentioned several times. It is a
good source of information and pointers to recent research. There are many
other books dealing with the wavelets from signal analysis point of view.
We have already referred to the one by M. Vetterli and J. Kovacevic [28]. It
contains a lot of information and many references. The book by G. Strang
and T. Nguyen [24] emphasizes filters and the linear algebra point of view.
The understanding of wavelets is enhanced by computer experiments. We
have encouraged the reader to do many, and we hope the reader has carried
out all suggested experiments. We have based our presentation on the public
domain MATLAB toolbox UvL Wave, written by a number of scientist at
Universidad de Vigo in Spain. It is available at the URL 8. Here one can also
find a good manual for the toolbox. There are some problems in using this
toolbox with newer versions of MATLAB. Their resolution is explained at
the URL II.
For further work on the computer there exists a number of other toolboxes.
We will mention two of them. One is the public domain MATLAB toolbox
WaveLab. It contains many more functions than Uvi_ Wave, and it has been
updated to work with version 5.3 of MATLAB. The many possibilities also
mean that it is more demanding to use. Many examples are included with
the toolbox. WaveLab is available at the URL 9. The other toolbox is the
official MATLAB Wavelet Toolbox, which recently has been released in a new
version. It offers a graphical interface for performing many kinds of wavelet
analysis. FUrther information can be found at the URL 10. There exist many
other collections of MATLAB functions for wavelet analysis, and libraries of
14.3 Some Web Sites 237
for example C code for specific applications. Again, a search of the Web will
yield much information.
Finally, we should mention once more that the M-files used in this book
are available at the URL 11. The available files include those needed in the
implementation examples in Chap. 11, and all examples in Chap. 13. There
is also some additional MATLAB and C software available, together with a
collection of links to relevant material. It is also possible to submit comments
on the book to the authors at this site.
14.3 Some Web Sites
Here we have collected the Web sites mentioned in this chapter. The reader
is probably aware that the information in this list may be out of date by the
time it is read. In any case, it is a good idea to use one of the search engines
to try to find related information.
1. http://www/wavelet.org
2. http://www.c3.1anl.gov/-brislawn/FBI/FBI.html
3. ftp://wwwc3.1anl.gov/pub/misc/WSQ/FBI_WSQ_FAQ
4. http://www.jpeg.org
5. http://www.cselt.it/mpeg/
6. http://paos.colorado.edu/research/wavelets/
7. http://www.beamcontrol.com
8. ftp://ftp.tsc.uvigo.es/pub/Uvi_Wave/matlab
9. http://www-stat.stanford.edu/-wavelab/
10. http://www.mathworks.com
11. http://www.bigfoot.com/-alch/ripples.html
References
1. J. C. van den Berg (ed.), Wavelets in physics, Cambridge University Press,

Cambridge, 1999.
2. A. Cohen, I. Daubechies, and J.-C. Feauveau, Biorthogonal bases of compactly
supported wavelets, Comm. Pure Appl. Math. 45 (1992), no. 5, 485-560.
3. A. Cohen, I. Daubechies, and P. Vial, Wavelets on the interval and fast wavelet
transforms, Appl. Comput. Harmon. Anal. 1 (1993), no. 1, 54-81.
4. A. Cohen and R. D. Ryan, Wavelets and multiscale signal processing, Chapman
& Hall, London, 1995.
5. I. Daubechies, Ten lectures on wavelets, Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, 1992.
6. , Orthonormal bases of compactly supported wavelets. II. Variations on
a theme, SIAM J. Math. Anal. 24 (1993), no. 2,499-519.
7. I. Daubechies and W. Sweldens, Factoring wavelet transforms into lifting steps,
J. Fourier Anal. Appl. 4 (1998), no. 3, 245-267.
8. J. H. Davenport, Y. Siret, and E. Tournier, Computer algebra, second ed., Aca-
demic Press Ltd., London, 1993.
9. S. Goedecker, Wavelets and their application, Presses Polytechniques et Uni-
versitaires Romandes, Lausanne, 1998.
10. C. Herley, J. KovaCevic, K. Ranchandran, and M. Vetterli, Tilings of the time-
frequency plane: Construction of arbitrary orthogonal bases and fast tiling algo-
rithms, IEEE Trans. Signal Proc. 41 (1993), no. 12, 2536-2556.
11. C. Herley and M. Vetterli, Wavelets and recursive filter banks, IEEE Trans.
Signal Proc. 41 (1993), no. 8, 2536-2556.
12. E. Hernandez and G. Weiss, A first course on wavelets, CRC Press, Boca Raton,
FL,1996.
13. B. Burke Hubbard, The world according to wavelets, second ed., A K Peters
Ltd., Wellesley, MA, 1998.
14. M. Kobayashi (ed.), Wavelets and their applications, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA, 1998.
15. W. M. Lawton, Necessary and sufficient conditions for constructing orthonor-
mal wavelet bases, J. Math. Phys. 32 (1991), no. 1, 57-61.
16. S. Mallat, A wavelet tour of signal processing, Academic Press Inc., San Diego,
CA,1998.
17. Y. Meyer, Wavelets and operators, Cambridge University Press, Cambridge,
1992.
18. , Wavelets, algorithms and applications, SIAM, Philadelphia, Pennsyl-
vania, 1993.
19. Y. Meyer and R. Coifman, Wavelets, Cambridge University Press, Cambridge,
1997.
20. C. Mulcahy, Plotting and scheming with wavelets, Mathematics Magazine 69
(1996), no. 5, 323-343.
240 References
21. P. Miiller and B. Vidakovic (eds.), Bayesian inference in wavelet-based models,

Springer-Verlag, New York, 1999.
22. A. V. Oppenheimer and R. Schafer, Digital signal processing, Prentice Hall Inc.,
Upper Saddle River, NJ, 1975.
23. A. V. Oppenheimer, R. Schafer, and J. R. Buck, Discrete-time signal processing,
second ed., Prentice Hall Inc., Upper Saddle River, NJ, 1999.
24. G. Strang and T. Nguyen, Wavelets and filter banks, Wellesley-Cambridge
Press, Wellesley, Massachusetts, 1996.
25. W. Sweldens, The lifting scheme: A custom-design construction of biorthogonal
wavelets, Appl. Comput. Harmon. Anal. 3 (1996), no. 2, 186-200.
26. , The lifting scheme: A construction of second generation wavelets,
SIAM J. Math. Anal. 29 (1997), no. 2, 511-546.
27. G. Uytterhoeven, D. Roose, and A. Bultheel, Wavelet transforms using the
lifting scheme, Report ITA-Wavelets-WP1.1 (Revised version), Department of
Computer Science, K. U. Leuven, Heverlee, Belgium, April 1997.
28. M. Vetterli and J. KovaCevic, Wavelets and subband coding, Prentice Hall Inc.,
Upper Saddle River, NJ, 1995.
29. B. Vidakovic, Statistical modeling by wavelets, John Wiley & Sons Inc., New
York,1999.
30. M. V. Wickerhauser, Adapted wavelet analysis from theory to software, A K
Peters Ltd., Wellesley, MA, 1994.
Index
Symbols biorthogonal, 23
P,14 - basis, see basis
U,14 - filter, see filter
T a ,22 - transform, 133
T s ,22 biorthogonality, 75
f2(Z), 12 biorthogonality condition, 78
x(w), 101 boundary
SSj_I,53 - correction, 49, 129, 166
Sj, 14 - filter, 128, 134-140, 144-148, 173,
evenj_I,14 175-180
oddj-l, 14 - problem, 49, 127, 127, 128, 165, 231
dj-l, 14, 15, 19 - problem solved with lifting, 165-168
Sj-l, 14, 19 building block, 14, 22, 23, 41, 53, 59,
2D, see two dimensional 79, 87, 129, 165, 166, 189
butter, 229
A Butterworth filter, 34
additivity property, 93
adjustment of grey scale, see grey scale C
affine signal, 17 caxis, 219
algorithm for best basis, see basis CDF, 23, 24, 85, 221, 230, abbr. Cohen,
aliasing, 101 Daubechies, Feauveau
alignment, 116-119 CDF(2,2), 23, 24, 31-34, 46-50, 54-58,
analysis, 22, 48, 72 66, 84-86, 113, 145, 146, 197
asymmetry, see symmetry CDF(2,4), 132
CDF(3,1), 86
B CDF(3,3), 186, 187
basis CDF(3,15), 217
- best, 93-96, 120, 126, 183-185 CDF(4,6), 113-115, 129, 146, 155,
- best basis search, 93-96, 125, 158-160, 164-171
225-230 CDF(5,3), 209
- best level, 96,111-113,117,120,126 change of basis, 89
- biorthogonal, 74-75, 78 chirp, 33, 111, 112, 114, 116, 119, 126,
- canonical, 38, 75, 89 216, 217, 220
- change of, 89 choice of basis, see basis
- choice of, 89-96, 120, 224 Coiflet 12, 204-208
- in software, see basis, representation Coiflet 24, 118, 119
- near-best, 93 Coiflets, 85
- number of, 91-93 colorbar, 220
- orthonormal, 74, 75, 78 completeness, 74
- representation, 181, 224-226 complexity, 39, 94, 193, 209
basis, 226 components
best basis, see basis - even, see even entries
242 Index
- odd, see odd entries energy, 40, 62, 80, 96-98, 102, 106, 140,
compression, 8, 16, 97, 110 213, 214, 217, 223
concentration, 96, 98, 229 - center, 118
continuous vs. discrete, 100-101 - color scale, 106
conv, 173 - concentration, 229
convolution, 1, 63, 66, 69, 74, 76, 103, - density, 121, 122
115, 118, 130, 147, 172, 193, see also - distribution, 102, 105, 121, 213, 217
filter - finite, 11, 12, 40, 69
correction, 14 entropy, 96-98, 120, 125, 126, 185, 225,
correlation, 7, 14, 52 226, 229, 231
cost function, 90, 93-94, 96, 98, 120, equation form, 23-24
126, 183, 185, 225, 226, 229, 231 error, 155
- examples of, 96-98 Euclidean algorithm, 194, 195
- Shannon entropy, see entropy even entries, 14, 15, 59, 65, 66, 190, 202
- threshold, see threshold
cyclic permutation, 157, 159 F
factorization, 73, 194, 196, 197, 199,
D
200, 202, 204, 206-209
daub, 212, 230
features, 7, 8, 10, 14
Daubechies 2, see Haar transform
filter, 61, 69, 74, 190, 192-194
Daubechies 4, 20, 41, 45-47, 83-85,
111-115,127,128,146,155-164,170, - biorthogonal, 85, 114, 132, 142, 217,
230
171, 186, 189, 192, 193, 202-204, 209
Daubechies 8, 212, 215 - CDF(2,2), 84
Daubechies 12, 113, 114, 116, 119 - Coiflet 12, 205
Daubechies 16, 216 - Daubechies 4, 83, 202
Daubechies 24, 118, 119 - Haar transform, 82
Daubechies filters, 85, 113, 114, 212, - low pass, 70
230 - orthogonal, 78-82, 85, 114, 118, 131,
decomposition, 22, 54, 57, 87, 88, 132, 136, 148, 173, 175, 209, 212, 221,
89-96, 103, 104, 106, 107, 109-113, 223, 230
115, 117, 119, 129, 146, 152-154, - Symlet 10, 209
180-183, 185-187, 212-215, 221-226, - taps, see impulse response
228-230 filter, 229
degree, see Laurent polynomial filter bank, 68, 69-74, 76-80
denoise, 28, 31, 33, 110, 220-225, 229, filter bank ordering, 110, 112, 221, 231
see also noise finite energy, see energy
details, 7 finite impulse response, 69, 80
determinant, 68, 71, 143, 193, 194, 200, finite sequence, see finite signal
201 finite signal, 11, 12, 18
diagonal lines, see image FIR, abbr. finite impulse response
difference, 7, 10, 12, 13, 15 first moment, see moment
directional effect, see image Fourier spectrogram, see spectrogram
discrete signal, 11 Fourier transform, 39, 61, 99, 100, 103,
discrete vs. continuous, 100-101 123
DWT, abbr. discrete wavelet transform frequency
DWT building block, see building block - content, 99-101, 109, 110, 112, 113,
DWT decomposition, see decomposi- 116, 125, 217
tion - localization, 111-114, 124, 216, 219
- ordering, 109, 110, 112, 221
E - response, 69, 83-85, 113, 114
element, 10, 87, 88, 89, 91, 92, 94-97, function, 151
103, 107, 109-112, 129, 136, 153, 154, fundamental building block, see
181-184, 221, 224, 226, 227 building block
Index 243
G - mean and difference, 12

gcd, abbr. greatest common divisor - of lifting, 19-21
generalization invertibility, 39, 67, 68, 73, 86, 193, 195
- of boundary filter, 139-140 - of Laurent polynomial, 69
- of DWT, 21-22 IR, abbr. impulse response
- of interpretation, 45-49 isplit, 215
- of lifting, 19-21 ivpk, 224, 231
- of mean and difference, 10 ivpk2d, 231
Gram-Schmidt, 137, 139, 140 ivt, 215, 230
- boundary filter, 134-140, 175-180 ivt2d,231
Gray code permutation, 110, 174
greatest common divisor, 194, 195, 197, L
199,206 Laurent polynomial, 67, 68, 69, 71, 73,
grey scale, 51, 54, 102, 103, 106, 111 193-195, 197-201
- degree of, 194
H left boundary filter, 138, 139, 140, 150,
Haar 176
- basis functions, 42, 43 Lena, 57
- transform, 21, 22, 25-31, 34, 38, length, 152
40-44, 53-57, 82, 83, 103, 127-130, level, 22, see also scale
153, 155, 166, 187 level basis, see basis, best level
Haar basis life, the Universe and Everything, 42
- functions, 42 lifting, 13-17
- CDF(2,2), 84
hexagonal lattice, 60
- CDF(2,x), 23
horizontal lines, see image
- CDF(3,x), 24
- Coiflet 12, 208
I - Daubechies 4, 83, 204
image lifting building block, see building
- directional effect, 54-56, 60 block
- synthetic, 53 linear chirp, see chirp
impulse response, 69 linear signal, 17
index zero, 11 log2, 155
indices logarithm, 82, 106, 113, 185, 217
- even, see even entries loss-free compression, 8
- odd, see odd entries lpenerg, 231
infinite energy, 140
infinite length, 11 M
infinite sums, 12 M-file, 151
inner product, 74, 75, 78, 123, 130, 137, mass center, 118
149, 177 matlab
'in place' transform, 13, 16, 59, 187 - basis, 226
instability, 40, 206-209 - butter, 229
integer lattice, 58, 59 - caxis, 219
inverse, 13, 22, 27, 29, 37, 41, 48, 52-54, - colorbar, 220
67, 68, 73, 76, 87, 137, 177, 205, 215, - cony, 173
230, 231 - daub, 212, 230
- CDF, see CDF - error, 155
- CDF(2,2), 19 - filter, 229
- CDF(4,6), 169, 186 - function, 151
- Daubechies, see Daubechies - isplit, 215
- Daubechies 4, 45, 160, 163, 186 - ivpk2d, 231
- Haar transform, 21, 22, 38, 186, see - ivpk, 224, 231
also Haar transform - ivt2d,231
244 Index
- iwt, 215, 230 noise reduction, see denoise

- length, 152 nonseparable, 51, 57, 60
- log2, 155 norm, 40, 96, see also energy
- lpenerg, 231 norm, 225
- mres2d, 232 normalization, 20, 21, 23, 24, 37,
- multires, 213, 232 40-41, 42, 45, 66, 67, 70, 73, 79, 84,
- norm, 225 96, 137, 138, 176, 177, 190, 225
- plot, 212 normalized Haar transform, 21
- pruneadd, 225, 231 normally distributed random numbers,
- randn, 218 see noise
- rem, 155 notation
- set, 222 - energy, 40
- shanent, 225, 231 - frequency variable, 62
- size, 155 - imaginary unit, 62
- sort, 223 - norm, 40
- sound, 221 - signal length, 11-12
- specgram, 216 number of bases, see basis
- split, 213, 232 numerical instability, see instability
- subplot, 215
- symlets, 221 o
- tfplot, 228, 231 odd entries, 14, 15, 17, 59, 64-66, 190,
- tree, 228, 231 202
- wpk2d, 231 odd shift, 72, 73, 79, 143
- wpk, 221, 231 one scale DWT, 51-53, 64, 67-69, 72,
- wspline, 217 87,97
- wt2d, 231 one step lifting, 15
ordering
- wt, 215, 230
- zeros, 152, 215 - filter bank, see filter bank ordering
- frequency, see frequency ordering
- zoom, 223
orthogonal, 74
matrix, DWT as, 38-39, 130-134
- filter, see filter
maxima location, 118
- matrix, 131, 137, 138, 140, 142, 148,
mean, 7, 10, 12, 13, 15, 16
150, see also orthogonal transform
merge, 19
- transform, 129, 133, 144, 177, see
mirroring, 128, 144
also orthogonal matrix
misalignment, 119 orthogonality condition, 79
moment, 18, 145, 146
orthogonalization, see Gram-Schmidt
monomial, 68, 69, 71, 194, 195, 197,
orthonormal basis, see basis
202 orthonormality, 74
MRA, abbr. multiresolution analysis
mres2d, 232 p
multires, 213, 232 Parseval's equation, 62
multiresolution, 4, 27-28, 31, 33, 34, perfect reconstruction, 39, 69-74, 76,
49, 53, 54, 212-216, 232 78-80, 86, 87, 128, 129, 136, 143, 146
- two dimensional, 53 periodicity, 100, 107
periodization, 128, 140-144, 157, 158,
N 162, 173, 186, 231
natural frequency ordering, see permutation, 110, 157
frequency ordering piecewise linear function, 46
nearest neighbor, 14, 17, 57-59 plot, 212
neighboring samples, 52 polyphase, 68
noise, 25, 28, 30, 32, 218-221, 223, 224, power, see energy density
229, see also denoise prediction, 14-15, 17
- colored, 229 preservation
Index 245
- of energy, 14, 69, 80, 81, 86, 96-98, short time Fourier transform, 99, 121,
136, 145, 148, 185, 186, 221, 223, see 122, 219
also energy signal, 12
- of length, 134, 141 - affine, see affine signal
- of mean, 14, 16, 59 - linear, see linear signal
- of moment, 18, 24, 128, 144-148, 175 - synthetic, 25-34, 53
- of perfect reconstruction, 128, 129, size, 155
134 sort, 223
- of time localization, 116 sound, 221
procedure specgram, 216
- prediction, see prediction spectrogram, 121-124, 216, 219, 220,
- update, see update 224
pruneadd, 225, 231 spline wavelet, 221
split, 14
Q split, 213, 232
quincunx lattice, 60 stability, 75
quotient, 194, 202, 203, 205 STFT, abbr. short time Fourier
transform
R structure, 7, 14
randn,218 subplot, 215
random numbers, see noise Symlet 12, 126
reconstruction, 7-9, 31, 32, 38, 43, 58, Symlet 30, 221
66, 72, 89, 90, 129, 132, 185, 214, symlets, 221
224, 228 symmetry, 85, 100, 107, 113, 116, 136,
regularity, 46 139, 141, 144, 149, 198, 221
rem, 155 synthesis, 22, 72
remainder, 194, 195, 199, 202, 203 synthetic signal, see signal
rescaling, see normalization
right boundary filter, 140, 176
T
S tfplot, 228, 231
sampling, 11, 42, 43, 100, 101, 111, 112, theorem
123, 146-148 - Euclidean algorithm, 195
- frequency, 100, 112, 114, 216, 220, - factorization of polyphase matrix, 73,
229 193
- rate, 100, 102, 116, 121, 124 threshold, 8, 9, 94, 96-98, 223, 224
- theorem, 100 time localization, 111, 114-116, 219
sampling theorem, 100 time shift, see shift in time
scale, 22, 23, 28, 38, 39, 48, 88, 129, time-frequency plane, 96, 102-107,
172, see also level 109-114, 116, 117, 119, 120, 122,
scaling, 42, 43, 46, 161, 164-166, 124-126,185,216,217,228,231
171, 191, 198, 199, 206, see a~o transform
normalization - 'in place', see 'in place'
scaling function, 42, 43, 46, 48, 50 - Daubechies, see Daubechies
- CDF(2,2), 48 - Haar, see Haar transform
script, 151 transient, 213-215
separable, 51, 53, 57, 144, 231, 232, 234 translation, 39, 42, 43, 46-48
sequence, 12 tree, 228, 231
set, 222 truncate, 135, 137, 139, 144, 149, 178
shanent, 225, 231 two channel filter bank, see filter bank
Shannon entropy, see entropy two dimensional
Shannon's sampling theorem, 100 - Haar transform, 53-54
shift in time, 63-65, 73, 79, 119, 135, - transform, 51-60
162 two scale equation, 171, 190
246 Index
U wavelet packet decomposition, see

undersampling, 101, 114, 116 decomposition
uniqueness, 42, 64, 72, 73, 76, 94, 136, WP, abbr. wavelet packets
141, 191, 192, 195, 199 vpk, 221, 231
update, 14-15, 16, 17 vpk2d, 231
wraparound, 113
V vspline, 217
vector, 12 vt, 215, 230
vertical lines, see image wt2d,231
W Z
wavelet, 42, 43, 46-49 z-transform, 62-63
- CDF(2,2), 49 zero index, 11
- Daubechies 4, 46, 47 zero padding, 10, 21, 49, 128-130, 134,
wavelet filter, see filter 135, 140, 141, 145
wavelet packet, 81-90, 103, 106-111, zeros, 152, 215
113, 119, 129, 180-185, 212, 220, 221, zeroth moment, see moment
224, 225, 229, 231 zoom, 223

Arne Jensen, Anders La Cour-Harbo (Auth.) - Ripples in Mathematics - The Discrete Wavelet Transform-Springer-Verlag Berlin Heidelberg (2001)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arne Jensen, Anders La Cour-Harbo (Auth.) - Ripples in Mathematics - The Discrete Wavelet Transform-Springer-Verlag Berlin Heidelberg (2001)

Uploaded by

Copyright:

Available Formats

Ripples in Mathematics

The Discrete Wavelet Transform

Library of Congress Cataloging-in-Publication Data

ISBN 3-540-41662-5 Springer-Verlag Berlin Heidelberg New York

MATLAB" is a registred trademark ofThe MathWorks. Inc.

Springer-Verlag Berlin Heidelberg New York

3. The Discrete Wavelet Transform via Lifting 11

4. Analysis of Synthetic Signals 25

6. Two Dimensional Transforms 51

1. Lifting and Filters I 61

9. The Time-Frequency Plane 99

10. Finite Signals 127

11. Implementation 151

12. Lifting and Filters II 189

13. Wavelets in Matlab 211

14. Applications and Outlook 233

This book gives an introduction to the discrete wavelet transform, and to

A. Jensen et al., Ripples in Mathematics

1.2 Guide to the Book

In Chap. 11 we show in detail how to implement wavelet transforms and

1.3 Background Information

CWT(f; a, b) = f(t)a- 1 / 2 1fi(a- 1 (t - b))dt .

The first constructions of such 'l/J were difficult.

In this chapter we introduce the discrete wavelet transform, often referred to

2.1 The Example

It is important to observe that no information has been lost in this transfor-

A. Jensen et al., Ripples in Mathematics

Table 2.2. Reconstruction with threshold 4

o '--_---L_ _- ' -_ _--'-_ _" - _ - - - - '_ _- ' -_ _-'

Table 2.3. Reconstruction with threshold 9

The above procedure can of course be performed on any signal of length 2N ,

2.1 Verify the computations in the tables in this chapter.

The inverse transform is then

There is an important thing to be noticed here. When we talk about mean

First step: a, b --+ a,8

or with the operations indicated explicitly:

First step: a, b --+ a, ~(a + b)

Since we do not need extra memory to perform this transform, we refer to

First step: d,8 --+ a,8

or with the operations given explicitly:

First step: d,8 --+ d + 8,8

3.2 Definition of Lifting

as special cases of more general operations. Remember that we previously (in

Fig. 3.2. Two step discrete wavelet transform

Starting with a signal Sj of length 2j and repeating the transformations in the

Table 3.1. Notation for Table 2.1

Substituting (3.11) into this formula we get

L Sj-dn ] = ~ L (sj[2n] + sj[2n + 1]) = ~ L sj[k] ,

3.3 A Second Example

As mentioned earlier, there are many other possible prediction procedures,

Sj-dn] = sj[2n] + A(dj-dn - 1] + dj-dn]) ,

Sj-dn] = sj[2n] + Adj-dn - 1] + Adj-dn]

To satisfy (3.12) we must choose A = t. Summarizing, we have the following

dj-dn] = sj[2n + 1] - ~(sj[2n] + sj[2n + 2]) , (3.13)

The transform in this example also has the property

Fig. 3.3. The linear prediction

3.4 Lifting in General

is inverted by the steps

eVenj_l = Sj-l - U(d j - 1 )

sj[2n] = sj-dn]- t(dj-dn - 1] + dj-dn]) , (3.16)

Fig. 3.4. Direct and inverse lifting step

Fig. 3.5. Three lifting steps

s)~dn] = sj[2n] + v'3s j [2n + 1] (3.18)