You are on page 1of 8


EE678 Wavelets Application Assignment

Applications of Wavelet Theory to Digital Audio
Signal Processing
Group Members:
Ankit Kulin (, 01D07015)
Amit Ghorawat (, 01D07017)
Abhisekh Sankaran (, 01D05009)

This thesis documents an application of wavelet theory to digital audio signal processing for electro-acoustic mu-
sical purposes. A summary of multiresolution analysis, the heart of wavelet theory, is then presented. Specifically, the
concept exponential decay of wavelet coefficients is used to construct algorithm spectrally enhances digital audio.

Index Terms
fourier transform, Heisenberg’s principle, wavelets, wavelet transforms, filter banks, vanishing moments, spectral

A VELET analysis is one of many generalized time-frequency analysis methods which give a signal’s fre-
W quency content at a certain point in time. Perhaps the most well-known of these methods is the Fourier family
of transforms, which include the continuous Fourier transform, which deals with continuous signals; the discrete
Fourier transform, which deals with discrete (sampled) signals; and the short-time Fourier transform, which deals with
signals in short, often overlapping, windowed segments.
The Fourier transform is a flexible analysis/resynthesis tool which has been extensively used in such applications as
spectral estimation. In a typical analysis/resynthesis, a musical signal is broken down into small ”chunks” or windows.
Each window is then analyzed to determine the frequency content in the window. This frequency information is altered
in some way, and then the altered frequency information is resynthesized back into a new musical signal. Although
Fourier methods are very powerful in practice, they also have drawbacks and limitations: a well-known fact of physics
and signal processing is that there is a tradeoff between the control of time and frequency resolution. This fact is known
as the Heisenberg uncertainty principle: the finer the time resolution of the analysis, the more coarse the frequency
resolution of the analysis, and vice-versa.
The Fourier transforms, in practice, are at one extreme of this time-frequency tradeoff. Usually, the Fourier trans-
form emphasizes a fine frequency resolution as opposed to a fine time resolution. For example, a composer is often
interested in distinguishing pitch, so distinguishing between 500 Hz and 580 Hz using the Fourier transform is impor-
tant. Subsequently, the Heisenberg uncertainty principle states that a certain amount of information, or equivalently,
a sufficient amount of time called a window is required for this analysis. However, if the same composer becomes
interested in distinguishing between 500 Hz and 540 Hz, the time period (window size) required by this second Fourier
analysis, according to the Heisenberg uncertainty principle, will be larger, or, equivalently, be longer in time than the
first window’s size, since the frequency resolution (or smallest possible frequency difference) is smaller in the second
case. As a result of the increased frequency resolution, however, the composer is more uncertain as to where in time
the distinctions between 500 Hz and 540 Hz occur than he is about where the distinctions between 500 Hz and 580 Hz
occur, since the window used to analyze the 500 Hz / 540 Hz case is longer in time than the window used to analyze
EE678 Wavelets Application Assignment, April 2005

the 500 Hz/580 Hz case is shorter in time. The main drawback of this compromise between time and frequency is that
sounds which do not take advantage of a fine frequency localization still suffer from poor time localization, since the
window has uniform frequency and time resolution throughout its duration. Conceptually, this tradeoff can be seen as
a certain, uniform, division of the time-frequency plane:

The time-frequency tradeoff associated with Fourier analysis has several implications for composers and scientists
working in electro-acoustic music. For example, one cannot theoretically construct a frequency equalizer whose pa-
rameters move infinitely fast, since the time required for such a change would be infinite. Furthermore, non-pitched
sounds, such as percussive sounds or transient clicks, are hard to localize in time and subsequently alter in meaningful
ways, since they tend to ”smear” across the time-frequency plane (high frequencies tend to die away slowly along the
time axis):
Wavelet analysis offers an alternative to Fourier analysis that remedies these problems. In essence, wavelet analysis
divides the time-frequency plane in a different, non-uniform manner:
Some properties of wavelet analysis have been presented below:
• Wavelet analysis can be seen as an analysis-resynthesis method.
• A wavelet representation is different from Fourier analysis. A wavelet representation is a non-uniform division
of the time-frequency plane in which varying bandwidths of frequency information have inversely proportional
associated time support. The standard wavelet basis associates higher bandwidths with higher center frequencies
and shorter time windows.
• In general, wavelet analysis is well-suited to isolating sharp transients in a signal, a task at which Fourier analysis
is not so well suited.
• Each wavelet basis function is a translation and/or scaling of a single general shape, called the mother wavelet.
The translation and scaling factors associated with these basis functions are all powers of two in the standard
analysis scheme.
• Unlike Fourier analysis, there may be many different shapes for mother wavelets, and one mother wavelet is
usually selected for a certain analysis because of its desired properties (e.g. vanishing moments, short time
domain support, smoothness, etc.).
• The degree to which a wavelet basis function is present in the signal can be described by a numerical value called
coefficient of the basis function.
• Wavelets are a study of difference spaces. They characterize how ”larger” basis functions, called scaling functions,
differ from one another by categorizing the differences between the scaling functions. Each category of difference
between the scaling functions is associated with a wavelet basis function. Successive levels of wavelet coefficients
represent increasingly detailed approximations to the analyzed signal.


The forward and inverse wavelet transform of a window of samples can be implemented using a set of upsamplers,
downsamplers, and recursive two-channel digital filter banks. For the forward wavelet transform, the output of the
filter banks are the wavelet coefficients at a desired level of resolution; for the inverse wavelet transform, the wavelet
coefficients are the input, and the output of the final set of filter banks is the reconstructed window. Typically, the
wavelet coefficients are analyzed or altered in the intermediate stage before the inverse transform is performed. Shown
below is a diagram of the entire analysis/resynthesis process for a three level, perfect reconstruction wavelet analysis
and resynthesis.

First, a window of input samples is selected which is a power of 2. The window of input samples is then processed
twice in parallel, once through the high pass filter, and once through the lowpass filter. The outputs of the filters are
then downsampled so that each output is exactly one half the size of the original input (every other sample of each
output is omitted). The downsampled output from the first highpass filter becomes the wavelet coefficients at level N,
where N is the number of levels of wavelet analysis desired (in the figure, N = 3). These coefficients contain the highest
time-resolution, widest bandwidth information, and highest center frequency wavelet coefficients of the analysis. The
downsampled output from the first lowpass filter serves as the input to an identical highpass/lowpass, downsampling
stage. The next stage’s downsampled output from the highpass filter will become the wavelet coefficients for level N-1
(N-1 = 2 in the figure). The (N-1)th level wavelet coefficients will have lower characteristic time-resolution by a factor
of two, narrower characteristic bandwidth information by a factor of two, and lower characteristic center frequency by
a factor of two. This process continues for the desired number of levels. The output of the final low-pass filter has the
same size as the level 1 wavelet coefficients, and are referred to as the final average coefficients.
The inverse transform works in a similar manner. First, the level 1 wavelet coefficients and the final average coef-
ficients are upsampled so that each is twice its original size (zeros are inserted between every sample of each set of
coefficients). The upsampled versions of the final average and the level 1 wavelet coefficients are passed through the
inverse low pass and inverse high pass filters, respectively (note that the analysis and synthesis high pass filters are
different, as are the analysis and synthesis low pass filters). The outputs are added together, and this sum serves as the
input to the low-pass channel for the next stage. This process continues until all of the wavelet coefficients have been
upsampled, filtered, and added to the total output of previous stages’ processing. The final sum is the reconstructed
The upsampling and downsampling depicted above is important in ensuring that the size of the output (number of
total wavelet coefficients on all levels + number of samples in the final average) is the same size as the input. If the
downsampling did not occur in the analysis (forward transform), there would be a ’data explosion’ . A way to view the

two-fold decrease in the number of wavelet coefficients for subsequent levels of analysis is that there is a change in the
time resolution from one level to another. A 256 sample input on analysis produces 128 level 3 coefficients, 64 level
2 coefficients and 32 level 1 and average coefficients. Since the wavelet coefficients on each level describe the entire
window’s time-frequency information, more coefficients on one level means a higher time-resolution for that level’s
wavelet coefficients; consequently, fewer coefficients on a lower level mean a lower time resolution for that level’s
wavelet coefficients.
A common problem in the implementation of digital filters on windowed data is how to deal with the boundaries of
the window. For example, when a long soundfile is divided into several 4096-sample windows, the wavelet coefficients
that correspond to the first and last few samples of each 4096-sample window exhibit artifacts from the windowing.
Among the several methods developed to deal with this problem, circular convolution is used in the discussion to
In order to perform wavelet analysis, the filters must satisfy the ‘perfect reconstruction’ property which means that
no aliasing or distortion should occur for a signal that enters and leaves the filter bank with no processing of wavelet
coefficients. Mathematically, we have the following. For no aliasing,

F0 (z)H0 (−z) + F1 (z)H1 (−z) = 0 (1)

For no distortion,
F0 (z)H0 (z) + F1 (z)H1 (z) = 2z −k (k ∈ N ) (2)
H0 (z) is the z - domain representation of the ”lowpass 1” filter in Figure 2.1
H1 (z) is the z - domain representation of the ”highpass 1” filter in Figure 2.1
F0 (z) is the z - domain representation of the ”lowpass 2” filter in Figure 2.1
F1 (z) is the z - domain representation of the ”highpass 2” filter in Figure 2.1
A simplification that satisfies the no aliasing condition is to assign

F0 (z) = H1 (−z) ; F1 (z) = −H0 (−z) (3)

The following equations express possible relations between the analysis high and low pass filters. The first of these
gives a Quadrature Mirror Filter(QMF) (in which the highpass response |H1 (z)| is a mirror image of the lowpass
magnitude |H0 (z)| with respect to the middle frequency/2 ) and the second gives a pair of orthogonal filters.

H1 (z) = H0 (−z) (4)

H1 (z) = −z −N H0 (−z −1 ) (5)
where N is the order of the H0 filter
From the above relations, we can see that we need to design just one filter, H0 (z) (described in terms of its co-
efficients h(0), h(1) ...) as the others can be obtained from this. It has been shown that this in turn boils down to
finding φ(t) which satisfies the following axioms of dyadic MRA which consists of constructing a ladder of subspaces
Vm (m ∈ Z) of L2 (R) such that
1) . . . V−2 ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ V2 . . .
2) m∈Z Vm = L2 (R)

3) m∈Z Vm = {0}

4) Given x(t) ∈ V0 , x(t − n) ∈ V0 ∀n ∈ Z

5) Given x(t) ∈ V0 , x(2m t) ∈ Vm
6) ∃ φ(t) ∈ V0 such that {φ(t − n)}n∈Z is an orthonormal basis for V0

Once φ(t) is found, the following ’dilation equation’ can be used to compute H0 (z)
φ(t) = 2h0 (n)φ(2t − n) (6)

Developments in wavelet theory have established that an extra requirement on the low-pass filters can make good
analyses of polynomials. Specifically, in addition to the requirements imposed on the filters by the dilation equation

and the high/low/analysis/synthesis interrelationships in (1) - (5), low-pass filters that have p zeros at can perfectly
analyze and reproduce a polynomial of degree p-1. The more zeros each have, the better the analysis is. In other
words, using low pass filters of degree p-1, if any portion of any level scaling coefficients is a polynomial of degree
p-1, then the next level’s scaling coefficients corresponding to that same portion of the signal will be exactly zero. In
general, therefore, wavelet analyses for which the low pass filters have zeros are best suited to analyzing polynomials.
Developments in wavelet theory have established that requiring the low pass filters to have p zeros at π allows a filter
to perfectly analyse and resynthesise a polynomial of degree p-1. A filter with p zeros at π is said to have p vanishing
Most often, wavelet analysis is applied to natural signals, which are seldom perfect polynomials. Nonetheless, in
practice, using a lowpass filter with p zeros means that the higher-level wavelet coefficients resulting from the analysis
will approach zero faster for a higher value of p, for piecewise-smooth polynomials. This is desirable, since we
hope that quickly decaying wavelet coefficients will produce a signal representation which has few non-zero wavelet
coefficients that die away quickly across levels, thus producing more concentrated and easily separated features in the
wavelet representation of the input signal. In practical terms, a lowpass filter with more zeros can exactly analyze
a signal that is piecewise polynomial of a higher degree. In terms of vanishing moments, the more the number of
vanishing moments of a filter, the more ‘jagged’ a signal it can analyse and perfectly reconstruct.
Wavelet families whose analysing and synthesising filter coefficients satisfy the above properties have already been
derived such as the Haar and rhe Daubechies filters. In audio processing, a filter which is not symmetric (or anti-
symmetric) about a central axis produces phase distortion in its output because it is not a linear phase filter. Although
phase distortion is a relatively mild form of distortion for audio, it is nonetheless desirable to eliminate as many kinds
of distortion possible in the final filtering of a signal. Biorthogonal filter banks allow for symmetric filter coefficients,
and thus produce linear-phase, non-phase-distorting filters. Therefore, biorthogonal, symmetric, binary filters are used
in the discussion to follow. Specifically, a binary filter is a filter whose coefficients are integers divided by a power of
two. Binary filters are significant not only for their simplicity, but also for their efficiency in computer-oriented integer-
only computations, where division by a power of two is often quicker than a floating point division. Two examples of
these filters are given below.

Name h0 (analysis low-pass) h1 (analysis high pass) f0 (resynthesis low pass) f1 (resynthesis high pass)
[+1,+2,+1] [−1,−2,+6,−2,−1] [−1,+2,+6,+2,−1] [−1,+2,−1]
binary 3/5 4 4 4 4
[−1.+2,+6,+2,−1] [−1,−2,−1] [+1,+2,+1] [−1,−2,+6,−2,−1]
binary 5/3 4 4 4 4


A. Motivation
Although the high frequency range of a typical audio recording has little power compared with its lower frequen-
cies, the presence of high frequencies is very important to the overall recorded sound in several ways. Apart from
the aesthetic importance of high frequencies in the recorded audio, a lot much localization information and ambient
information is contained in the high frequency regions above 10 kHz. In any pragmatic physical system, however, high
frequencies are usually quickly attenuated from a source due to dissipative forces such as friction and heat. Recorded
sound, which usually passes through several physical systems (transducers) on its way from live performance or orig-
inal realization to a listener’s ear, suffers from these same dissipative forces. In fact, much high frequency content is
still lost in many of the best audio recording and reproduction systems available today. For example, early compact
disc recorders and players introduced the notion of ”emphasis” which in essence is a high frequency-band equalizer
that compensates for high frequency loss on recording and playback. Earlier analog tape systems also pre-processed
sound in the same way by boosting high-frequency tape content on playback.
Although digital recording and processing systems try to alleviate this problem by providing a no-loss recording
and playback system, the increasing need for cross-platform networking for musical purposes, along with increasingly
popular Internet-based music disseminating mechanisms has posed a new set of problems related to low sample rates,
which also affect the high-frequency content of recorded sounds. The size of an audio file is directly proportional to

the rate of sampling which in turn is directly related to the audio quality. Sound recorded at 44 kHz (CD-quality audio)
will store twice as much information as sound recorded at 22 kHz.
Because CD-quality audio is expensive in terms of storage space and network-transfer time, many commonly avail-
able audio files on the Internet, for example, do not use the full, CD-quality audio sampling rate of 44.1kHz. Instead,
most files available on the network use a lower sampling rate, such as 8kHz, 11.025 kHz, 16 kHz, or 22.050 Hz. While
this dramatically cuts down on storage space and transfer time, it dramatically reduces the frequency content too. This
is a direct consequence of Nyquist’s theorem by which, for a digital signal sampled at a rate, r Hz, it is possible to
capture only those frequencies in the range 0 to r/2 Hz. In other words, a sound file with a low sampling rate can
capture and reproduce a lower range of frequencies than a sound file with a higher sampling rate. The many EQ-based
mechanisms that have been introduced to accomplish high-frequency compensation/boosting are called exciters in the
audio industry, and generally work by emphasizing existing but attenuated high-frequency content of a pre-recorded
signal. However, if we wish to excite digital signals with sampling rates lower than 44.1kHz, we face a more acute
problem: the high frequencies for which we wish to compensate are not even present. Somehow, if we are to enhance
the high frequency bands of a low-sample-rate audio file, we must actually guess the high frequency content from the
existing, lower frequency content. For example, an 11.025 kHz file found on the network only has frequency content
between 0 and 5.5125 kHz. We wish to compensate for the frequencies from 5.125 kHz to 20kHz, so that we can
approach a simulated CD-quality sound.
The advantage of this estimation of high frequencies is that if it is even moderately successful, it could provide a
means for faster transmission of audio; in essence, a new form of compression could be developed that effectively
expands the frequency bandwidth of an audio file. Specifically, network distribution of audio could be accomplished
with low-sample-rate files and non-critical audio storage systems could be developed that use this technique.

B. A wavelet based excitation algorithm

Lets see how an analysis-synthesis wavelet technique can be used to spectrally enhance the high-frequency content
of low-sample-rate audio files. A general outline of the algorithm is as follows:
1) First, decompose the low-sample-rate audio file using the standard, dyadic, circular-convolution filterbank con-
struction with a set of analysis/synthesis filters having a specific number of vanishing moments.
2) Next, construct an empty level of wavelet coefficients (all coefficients set to zero) one level higher than the
highest level of wavelet analysis of the soundfile.
3) Based on the exponential decay property of the wavelet coefficients in the analysis, guess the values of these
coefficients based on the values of the existing wavelet coefficients.
4) Perform the inverse wavelet transform on the expanded block of wavelet coefficients.
Let us take an in-depth look at this process.

1) Perform the forward wavelet transform on a window of the low-sample-rate audio file.

Let there be p vanishing moments in the analysis and synthesis filters. One of the strengths of wavelet analysis
as opposed to Fourier analysis is that wavelet coefficients are more concentrated in the time-frequency plane
than are Fourier coefficients. In other words, Fourier coefficients tend to ”die away” in the time-frequency plane
more slowly from high initial values than wavelet coefficients do. Quantitatively, it has been shown that there is
an exponentially-related law that roughly relates the decay of the wavelet coefficients at a certain analysis level
to (1) the smoothness of the analyzed function and (2) the wavelet analysis level number:
If f (t) has p derivatives, its wavelet coefficients at level j decay as 2jp , where j is wavelet analysis level
|bjk | = f (t)wjk (t)dt ≤ C2−jp f (p) (t)


where bjk is the wavelet coefficient at level j, translation k, wjk is the wavelet function at level j, translation k
and C ∈ R
If we make the reasonable assumption that the signal we are analyzing has approximately the same smoothness
as the analyzing wavelet, then p in the (7) above can be replaced by the number of vanishing moments in the an-
alyzing and synthesizing wavelet. When we have completed our wavelet analysis of a window of samples from

the low-sample-rate audio file, we can roughly expect to see an exponential decay of the wavelet coefficients
across levels.

2) Add a new layer of wavelet coefficients to the existing wavelet analysis, one level above the most detailed level
of analysis already existing.

There are several issues raised in this step. First, the new level of coefficients has the potential for carrying infor-
mation with the same frequency bandwidth as all of the previous levels combined. This new level of coefficients
has the potential to contain higher frequency data than is possessed by the existing levels of wavelet coefficients.
Furthermore, as dyadic MRA is used, the time resolution is twice as fine as the level directly before it. An in-
verse wavelet transform performed on the expanded window, with this added iteration, should therefore produce
a soundfile that can have twice the bandwidth as the original file; we only have to guess at what the new level’s
coefficients’ values should be. There is one complication, however: adding another level of wavelet coefficients
to the existing wavelet coefficients, using the standard dyadic form of the wavelet transform, exactly doubles the
amount of data that is contained in that window. In other words, the number of wavelet coefficients is doubled
when the new, currently zero-valued wavelet coefficients are introduced. Therefore, we need more ”headroom”
in terms of the sampling rate - it must be altered to allow for the added high frequencies. Simply, we double the
sampling rate to allow for twice the bandwidth of the original file, allowing for the added high frequencies.

3) Guess the values of the newly allocated coefficients, based on the exponential decay law of the wavelet coeffi-

The guessed coefficients’ values are guessed from the existing ones (See figure above). A guessed coefficient
is equal to an exponentially decaying reliance on the coefficients that fall directly below it, one from each level
of analysis (see Figure below). For example, in a five-level analysis, the guessed coefficient is a sixth-level
coefficient which is equal to:

c6 = 2−p c5 + 2−2p c4 + 2−3p c3 + 2−4p c2 + 2−5p c1 + 2−6p c0 (8)

where cn is the ”appropriate” wavelet coefficient at level n p is the number of vanishing moments in the analyzing
and synthesizing wavelets In the above equation, the guessed wavelet coefficient on level 6 is most reliant on its
closest neighbor, a fifth-level coefficient, but it is also reliant on all levels analysis in an exponentially decreasing
manner. However, the guessed coefficient may rely on fewer levels of coefficients, which in practice leads to less
aliasing effects in the output signal. In the cases where not all wavelet levels are used, the best results usually
only use one level of wavelet coefficients, namely, the level directly below the guessed coefficients level, in
guessing the unknown wavelet coefficient. Theere is a generalisation to the above formula for an entire level of
guessed wavelet coefficients. This formula further quantifies which wavelet coefficients on which previous levels
(specifically, which translates on a certain level) are used to compute a specific, guessed wavelet coefficient:
X 1
cj+1,k = c k (9)
2p(j−i+1) i, 2i

4) Perform the inverse wavelet transform on the altered window of material.

The inverse transform is performed normally, with the same set of analysis/synthesis wavelet filter bank. The
only problem encountered in this scheme is a loss of power in the resulting output signal - by adding another
level of wavelet coefficients, the conservation of power property of the wavelet transform has been violated. In
audio files, this is easily remedied by normalizing the output file.

As a summary, we have seen here the problem of adding higher frequency information to a low sample rate audio
file. The motivation for doing this is that even a moderately successful algorithm will enable storing of audio files with
smaller size thus providing a kind of compression. This is also benefical in reducing the transfer time of audio files over
the Internet. The method primarily is guessing the higher frequency content from the existing frequency information.
Wavelets are a good choice for the type of guesswork seen above. Since high frequency information usually has
relatively little power, the exponential decay property of wavelet coefficients does well in providing a reasonable guess
of the magnitude of a guessed wavelet coefficient. The algorithm is simple because it is only concerned with one large
band of frequency information at a time; in a Fourier version of this algorithm, many more bands of information would
have to be guessed at, and more processing time would be needed to achieve a similar result.

The authors would like to thank Professor V.M. Gadre for giving us a chance to work in this interesting and unique

[1] H. Kopka and P. W. Daly, A Guide to EX, 3rd ed. Harlow, England: Addison-Wesley, 1999.
[2] Strang, Glibert and Nguyen, Truong. Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley, MA: 1996.
[3] Wilson, Roland et al. ”A Generalized Wavelet Transform for Fourier Analysis: The Multiresolution Fourier Transform and Its Application
to Image and Audio Signal Analysis.” IEEE Transactions on Signal Processing, 38:2, 1992.
[4] Vetterli, Martin and Herley, Cormac. ”Wavelets and Filter Banks: Theory and Design.” IEEE Transactions on Signal Processing, 40:9, 1992.
[5] Sinha, Deepen and Tewfik, Ahmed H. ”Low Bit Rate Transparent Audio Compression using Adapted Wavelets.” IEEE Transactions on
Signal Processing, 41:12, 1993.
[6] Delprat, N. ”Extraction of Frequency Modulation Laws in Sound Synthesis.” Wavelets and Applications. Springer-Verlag, New York: 1991.