You are on page 1of 4

Wavelets and audio data compression

Tibor Asztalos1, Isar Alexandru2


1,2
“Politehnica” University of Timisoara, Faculty of Electronics and Telecommunications,
Bd. V. Parvan no.2, 1900, Timisoara, Romania, tel./fax. 056-190608,
e-mail: 1 asztalos@ee.utt.ro, 2 isar@ee.utt.ro

Abstract - In this work we present an audio data compression and signal to noise (SNR) enhancement
compression software package which uses a Discrete scenarios, [3], [4], [5], [6], [7].
Wavelet Transform based compression procedure. The Our software realises the compression of the Nyquist
used wavelet function can be chosen from the well known sampled music or speech signal by using this orthogonal
Daubechies class of compactly supported wavelets. Our transform, which has a strong decorrelation effect on
implementation uses an adaptive manner for the wavelet
these signal samples. In this way we will have, in the
domain threshold value computation. The package has two
major components: a compression software which reads a transform domain, many zeroes or very small values,
standard audio data file and the reconstruction software which can be neglected, achieving the desired
which reads a compressed file and generates a standard compression. The DWT was proposed due to its speed
audio file. Experimental results are also presented. (faster than the FFT) and to its whitening properties
superior to the decorrelation properties of other well
I. DATA COMPRESSION WITH WAVELETS known transforms, [1], [2], (for example the Discrete
Cosine Transform - DCT, which was included in a series
Data compression is a very largely used procedure for of compression standards).
data storage purposes. There exist a large variety of
compression algorithms, many of them standardised and II. THE DATA COMPRESSION SYSTEM
each of them having its advantages and its backdraws.
They offers speed, high compression ratio, portability The principle of our implementation is presented in
etc., but there is not a single algorithm which best fit to figure 1.
all kind of applications. They were classified in two
xˆ[ n] y[n] yˆ[ n]
major categories, the first includes the ones for x[n] u[n]
compression without loss and the second one includes log 2 (⋅) DWT λ
those for compression with loss. All the algorithms have
the same purpose, to reduce the information redundancy Figure 1. The wavelet compression system
from the considered data set, but those from the first
class allow a perfect reconstruction while those from the The input sequence consist of a series of PCM coded
second class does not. Those procedures which allow speech signal samples. These samples are encoded in an
small information losses can achieve higher compression either 8 or 16 bit/sample format. In fact our software can
ratio, but they can not be utilised in applications where accept as input a standard audio data file, in the well
the data set integrity is primordial (for file compression known wave format (*.wav files), which contains either
for example). This second class includes various voice mono or stereo sound in the above mentioned PCM
and image compression algorithms. In this work we format. The first block in figure 1 produces a speech
present a complete audio compression/decompression dynamic compression. The input-output relation of this
software package, which implements an algorithm from block is:
this second class, which exploits a very popular xˆ[n]= 215 ⋅( log 2 x[n]− 15) (1)
orthogonal transform named the Discrete Wavelet where was exploited the facts that the samples x[n] are in
Transform (DWT), [1], [2]. The topics of the wavelet PCM format, having the first bit as the sign bit (set to 1 if
transforms and their properties had been studied very the sample value is positive and to 0 if is not) and that
extensively by the authors in the context of various data these values are distributed around the value zero, which
is represented in this format by the hexadecimal number
1000H. This value correspond to 215 in the decimal form. 8 bit/sample sequence u[n], which is much more shorter
Thus, all the values log2(x[n]), computed as floating than the input block (one sequence u[n] for each 4096
numbers with double precision, will be around the sample length input block). Only the knowledge of this
decimal value 15. For the reason to have centred sequence values is not enough for the speech sequence
numbers, the decimal number 15 is substracted. By reconstruction. In fact we have retained a set of non-zero
multiplying with 215, we realise a linear rescaling of the values from a larger ordered set (the sequence ŷ[n]). For
obtained values. This sequence x̂[n] is applied to the the reconstruction we need to know also the positions of
DWT block, which computes its Discrete Wavelet these non-zero values in this sequence. There are various
Transform. This computation is done according to the possibilities to do this. Our experimentation shows that
fast DWT algorithm proposed by Daubechies, [1]. This in the case of very short sequences u[n] the [position;
procedure acts on power of two length data blocks. For sequence value] data structure is recommended for
this reason, the input data set is segmented in blocks with storage. This is the case of oversampled input sequences
the length of NR=4096 samples. So, the input data x[n]. But we intend to use our software for the case of
stream is handled block by block. The used wavelet’s Nyquist sampled speech signals, when the above storage
mother function is one of the Daubechies type - noted structure do not fit well. This is the reason why we
Dau-N, [1], where N indicates its order, which ranges adopted the following compressed file structure:
from 2 to 10. This number is the value which is required
as second argument option (the first required argument is 1. The sampling frequency - of 4 bytes
a name of a type *.wav file) in the command line, which length
is optional, by default it is initialised as N=10. Our 2. First Block header structure - of 5 bytes
experimentation shows that in the case of the Nyquist length
sampled speech signal the higher is the wavelet order the 3. The index vector of the data - of 16 bytes
achieved compression ratio is higher. At the output of index vector length
this block, we obtain a NR=4096 length double precision 4. The data index vector - of variable
floating values sequence, y[n]. It contains an important nonzero sequence values size length
number of zeroes or very small values. This sequence is
5. The nonzero quantified - of variable
passed through a non-linear adaptive threshold system.
data sequence values size length
This system discards all the input sequence values, which
2. Second Block header - of 5 bytes
are with the magnitude smaller than a given threshold,
structure length
named λ. This threshold is chosen in an adaptive
3. The index vector of the data - of 16 bytes
manner. The main problem in choosing this value is
index vector length
given by the fact that this operation causes the losses of
… … …
information. It is widely known that in the field of speech
processing the human ear is not very L2. A suitable Figure 2. The compressed file structure
measure for good quality have not been developed yet for
neither speech, vision nor smell. A good measure is still The structure of the compressed file includes five
to be developed. For this reason objective measures were types of fields. The first field, of four bytes length,
utilised in this work by enabling users to specify the contains the sampling frequency in a long unsigned
minimum signal power to distortions power ratio, with integer format. This field appears only once at the
the recommen-dation that this value do not have to beginning of the file and will be used at the
decrease under the value 100. In fact this is the default reconstruction, to save the uncompressed PCM audio
value used for this parameter. Using this value, the sequence samples in a wave file format. The other four
program computes and uses for each input data block, the fields are repeated in the same order, for each input
appropriate λ value. block. The second field consists of a five bytes length
The last block in figure 1 is a linear 8 bit/sample block header, which stores the wavelet function
quantifier which encodes the sequence of non-discarded identification number N (on one byte) followed by the
values ŷ[n]. The result is a 8 bit/non-discarded
computed quanta (used by the quantifier) in a floating
number format (on four bytes). The following two fields
transform domain sample value sequence u[n], which encodes the non-zero values positions of the sequence
posses the main quantity of the remained information. u[n] in the correspondent sequence ŷ[n]. In fact the fifth
This sequence is stored.
field is the one which stores these non-zero quantified
III. THE COMPRESSED FILE STRUCTURE values in the increasing order of their indexes. The
position encoding is done by considering a 4096 length
We have seen that at the output of the bit stream in which each bit indexes a value in the
compression system presented in figure 1. we obtain the sequence ŷ[n] and is set to 1 if this value is non-zero or
to 0 if this value is zero. For a higher compression ratio N identifies the Daubechies wavelet function order (its
this data index vector (of 512 bytes length) is considered default value is 10); Px /Perr is the maximum value of the
as a 256 sample length unsigned integer sequence and is desired signal to distortion power ratio (default 100); The
encoded in the same manner, as was done for the data output is a file with the name input_file.cmp which has a
sequence. Hence we will have a new 256 bit length index data structure presented in section III;
stream (32 bytes) and a variable number of non-zero data - the decmp.exe file with the synopsys,
index vector.
decmp.exe <input_file.cmp> [<output_file.wav>]

which takes an input compressed file named


input_file.cmp, realises the reconstruction and saves the
IV. THE DATA RECONSTRUCTION SYSTEM results in a standard wave format, with the same name
(of course with the *.wav extension) if it is not specified
The reconstruction system computes the initial speech a second argument, or with the name of output_file.wav
signal from the compressed data file. The principle of if this were specified.
this system is presented in figure 3. Some of our experimentation results are presented in
~ ~ table 1, where was considered a 3.52 s long CD quality
yˆ[ n] xˆ[ n] x [ n]
u[n] stereo music signal, sampled at Nyquist rate 44100 Hz,
15 + ( ⋅)
IDWT 2 2
15 and compressed using different Daubechies wavelet
filters (different N ) and different minimum signal to
Figure 3. The reconstruction system distortion power ratio, Px /Perr . The obtained
compression ratios are presented.
The first block, in figure 3 reads the non-zero
quantified values sequence u[n] and their corresponding Px/Perr ratio
quanta value q and computes the 4096 samples long
sequence ŷ[n]. This is followed by an Inverse Discrete 50 100 200
Wavelet Transform. The appropriate wavelet function is Order
chosen according to the order number N, which also is N
read from the file from the correspondent block header 2 18.28293 14.27183 11.31611
structure. Thus we obtain the reconstructed speech 3 20.41975 16.30223 13.18288
sequence samples ~ x̂ [n], which is with the dynamic 4 21.35537 17.05412 13.95142
compressed. The last operation is the dynamic 5 22.10908 17.65667 14.43862
decompression. The input-output relation of the last 6 22.57396 18.06336 14.74008
block is given by: 7 22.71228 18.20808 14.87317
x~ˆ[n ] 8 23.05539 18.44322 15.0646
15 +
x [n]= 2
~ 215
(2) 9 23.30215 18.65093 15.1511
10 23.33183 18.74526 15.29926
The reconstructed audio signal ~ x [n], in PCM format is Table 1. Audio compression ratios for different wavelet
saved in a standard wave format. This task will require function and distortion factor
the sampling frequency specification, which was written
in the first field of the compressed file. The subjective appreciation of the quality of the
reconstructed sound was made. This appreciation
V. CONCLUSION imposes a Px /Perr ratio not smaller than 100. Figure 4
shows the first 8096 samples length segment of the
In this work we have presented a software package considered sound signal (figure 4.a) and the
which can be used for audio data compression and corresponding reconstructed signal (figure 4.b) when is
reconstruction. In fact this package includes two used the Dau-10 wavelet function and is accepted a
executable files, one for compression and one for minimum signal to distortion power ratio of 100.
reconstruction. These programs are:
-the cmp.exe file with the synopsis,

cmp.exe <input_file.wav> [N] [Px /Perr]

where the input_file.wav is an input file name in a wave


format, which contains a PCM encoded audio sequence;
x 10
4
Symposium on Electronics and Telecommunications
1
"ETC 94", vol. III,Timisoara, September 1994, pp.31-36.
0.5
[4] T. Asztalos, A. Isar, ”An Adaptive Data Compression
0 Method Based on the Fast Wavelet Transform”,
-0.5 Proceedings of the Symposium on Electronics and
-1
Telecommunications "ETC' 94", vol. III,Timisoara,
0 2000 4000 6000 8000 10000 September 1994, pp.37-42.
x 10
4
a) [5] D. Isar, T. Asztalos, A. Isar, “De-Noising with
1
Wavelets”, Proceedings of International Conference,
0.5
"SCS’95", Iasi, Romania, November 1995, pp. 51-54.
0
[6] T. Asztalos, ”An Algorithm for the DWT on Block
-0.5 Computation“, Buletinul stiintific al UPT, TOM 41 (55)
-1 Electrotehnica, Electronica si Telecomunicatii, vol. II,
0 2000 4000 6000 8000 10000
b) 1996, pp.128-133.
Figure 4. The processed audio signal; a) the input signal; [7] T. Asztalos, D. Isar, A. Isar, “Data Adaptive
b) the reconstructed signal. Compression Using Wavelets”, Buletinul stiintific al
UPT, TOM 43 (57) Electrotehnica, Electronica si
If this value is used, than no distortion in the Telecomunicatii, fascicula 2, 1998, vol. I., pp. 79-82.
reconstruction signal is observed. This is the case in [8] M. Giurgiu, “Results on Speech Compressi-on using
figure 4. Regular Pulse Excitation with Long Term Prediction”,
The method most frequently used for the compression Proceedings of International Symposium Etc’98,
of audio signals is based in the linear predictive coding Timisoara, September 17-18, 1988, vol I., pp. 65-69.
algorithm (LPC). This method is used in the GSM [9] P. Srinivasan, L. M. Jamieson, “Techniques for
standard too. The compression factor usually obtained Variable Rate Speech Coding using Wavelet
using this compression method is around 8, [8]. This Representations”, Proceedings of the IEEE conference
method is computationally intensive. These are the TFTS’96, Paris, July, 1996, pp. 109-112.
reasons why the superiority of the compression method [10] C. Taswell, “Speech Compression with Cosine and
proposed in this paper is obvious. Wavelet Packet Near-Best Bases”, Preprint, Stanford
The results presented in this paper are similar with University, 1995.
the results presented in other papers with the same [11] J. Froment, “Traitement d’images et applications de
subject, the compression of audio signals based on la transformée en ondelettes”, Thèse de doctorat,
wavelets, [9], [10]. The originality of our approach Université Paris IX, 1990.
comes from its adaptive nature and of the proposal to use [12] O. Rioul, “Ondelettes réguliers: Applications á la
also a dynamic’s compression. Our method can be compression d’images fixes”, Thèse de doctorat, ENST
enhanced with a strategy to select the best wavelet’s Paris, Mars 1993.
mother for the compression of a given audio signal. Such
strategy is presented in [7]. It is inspired from [11] and
[12]. This is a future research direction for the authors of
this paper. Another source for the enhancement of the
compression ratio is the substitution of the scalar
quantifier with a vectorial one.

REFERENCES

[1] I. Daubechies, “Orthonormal Bases of Compactly


Supported Wavelets”, Comm. Pure Appl. Math., No. 41,
pp.909-996, 1988.
[2] D. L. Donoho, “Nonlinear Wavelet Methods for
Recovery of Signals, Densities and Spectra from Indirect
and Noisy Data”, Proccedings of Symposia in Applied
Mathematics, vol. 47, ed. I. Daubechies, pp.173-205,
1993.
[3] A. Isar, T. Asztalos, “Using the Fast Wavelet
Transform for Data Compression”, Proceedings of the

You might also like