Professional Documents
Culture Documents
Abstract - In this work we present an audio data compression and signal to noise (SNR) enhancement
compression software package which uses a Discrete scenarios, [3], [4], [5], [6], [7].
Wavelet Transform based compression procedure. The Our software realises the compression of the Nyquist
used wavelet function can be chosen from the well known sampled music or speech signal by using this orthogonal
Daubechies class of compactly supported wavelets. Our transform, which has a strong decorrelation effect on
implementation uses an adaptive manner for the wavelet
these signal samples. In this way we will have, in the
domain threshold value computation. The package has two
major components: a compression software which reads a transform domain, many zeroes or very small values,
standard audio data file and the reconstruction software which can be neglected, achieving the desired
which reads a compressed file and generates a standard compression. The DWT was proposed due to its speed
audio file. Experimental results are also presented. (faster than the FFT) and to its whitening properties
superior to the decorrelation properties of other well
I. DATA COMPRESSION WITH WAVELETS known transforms, [1], [2], (for example the Discrete
Cosine Transform - DCT, which was included in a series
Data compression is a very largely used procedure for of compression standards).
data storage purposes. There exist a large variety of
compression algorithms, many of them standardised and II. THE DATA COMPRESSION SYSTEM
each of them having its advantages and its backdraws.
They offers speed, high compression ratio, portability The principle of our implementation is presented in
etc., but there is not a single algorithm which best fit to figure 1.
all kind of applications. They were classified in two
xˆ[ n] y[n] yˆ[ n]
major categories, the first includes the ones for x[n] u[n]
compression without loss and the second one includes log 2 (⋅) DWT λ
those for compression with loss. All the algorithms have
the same purpose, to reduce the information redundancy Figure 1. The wavelet compression system
from the considered data set, but those from the first
class allow a perfect reconstruction while those from the The input sequence consist of a series of PCM coded
second class does not. Those procedures which allow speech signal samples. These samples are encoded in an
small information losses can achieve higher compression either 8 or 16 bit/sample format. In fact our software can
ratio, but they can not be utilised in applications where accept as input a standard audio data file, in the well
the data set integrity is primordial (for file compression known wave format (*.wav files), which contains either
for example). This second class includes various voice mono or stereo sound in the above mentioned PCM
and image compression algorithms. In this work we format. The first block in figure 1 produces a speech
present a complete audio compression/decompression dynamic compression. The input-output relation of this
software package, which implements an algorithm from block is:
this second class, which exploits a very popular xˆ[n]= 215 ⋅( log 2 x[n]− 15) (1)
orthogonal transform named the Discrete Wavelet where was exploited the facts that the samples x[n] are in
Transform (DWT), [1], [2]. The topics of the wavelet PCM format, having the first bit as the sign bit (set to 1 if
transforms and their properties had been studied very the sample value is positive and to 0 if is not) and that
extensively by the authors in the context of various data these values are distributed around the value zero, which
is represented in this format by the hexadecimal number
1000H. This value correspond to 215 in the decimal form. 8 bit/sample sequence u[n], which is much more shorter
Thus, all the values log2(x[n]), computed as floating than the input block (one sequence u[n] for each 4096
numbers with double precision, will be around the sample length input block). Only the knowledge of this
decimal value 15. For the reason to have centred sequence values is not enough for the speech sequence
numbers, the decimal number 15 is substracted. By reconstruction. In fact we have retained a set of non-zero
multiplying with 215, we realise a linear rescaling of the values from a larger ordered set (the sequence ŷ[n]). For
obtained values. This sequence x̂[n] is applied to the the reconstruction we need to know also the positions of
DWT block, which computes its Discrete Wavelet these non-zero values in this sequence. There are various
Transform. This computation is done according to the possibilities to do this. Our experimentation shows that
fast DWT algorithm proposed by Daubechies, [1]. This in the case of very short sequences u[n] the [position;
procedure acts on power of two length data blocks. For sequence value] data structure is recommended for
this reason, the input data set is segmented in blocks with storage. This is the case of oversampled input sequences
the length of NR=4096 samples. So, the input data x[n]. But we intend to use our software for the case of
stream is handled block by block. The used wavelet’s Nyquist sampled speech signals, when the above storage
mother function is one of the Daubechies type - noted structure do not fit well. This is the reason why we
Dau-N, [1], where N indicates its order, which ranges adopted the following compressed file structure:
from 2 to 10. This number is the value which is required
as second argument option (the first required argument is 1. The sampling frequency - of 4 bytes
a name of a type *.wav file) in the command line, which length
is optional, by default it is initialised as N=10. Our 2. First Block header structure - of 5 bytes
experimentation shows that in the case of the Nyquist length
sampled speech signal the higher is the wavelet order the 3. The index vector of the data - of 16 bytes
achieved compression ratio is higher. At the output of index vector length
this block, we obtain a NR=4096 length double precision 4. The data index vector - of variable
floating values sequence, y[n]. It contains an important nonzero sequence values size length
number of zeroes or very small values. This sequence is
5. The nonzero quantified - of variable
passed through a non-linear adaptive threshold system.
data sequence values size length
This system discards all the input sequence values, which
2. Second Block header - of 5 bytes
are with the magnitude smaller than a given threshold,
structure length
named λ. This threshold is chosen in an adaptive
3. The index vector of the data - of 16 bytes
manner. The main problem in choosing this value is
index vector length
given by the fact that this operation causes the losses of
… … …
information. It is widely known that in the field of speech
processing the human ear is not very L2. A suitable Figure 2. The compressed file structure
measure for good quality have not been developed yet for
neither speech, vision nor smell. A good measure is still The structure of the compressed file includes five
to be developed. For this reason objective measures were types of fields. The first field, of four bytes length,
utilised in this work by enabling users to specify the contains the sampling frequency in a long unsigned
minimum signal power to distortions power ratio, with integer format. This field appears only once at the
the recommen-dation that this value do not have to beginning of the file and will be used at the
decrease under the value 100. In fact this is the default reconstruction, to save the uncompressed PCM audio
value used for this parameter. Using this value, the sequence samples in a wave file format. The other four
program computes and uses for each input data block, the fields are repeated in the same order, for each input
appropriate λ value. block. The second field consists of a five bytes length
The last block in figure 1 is a linear 8 bit/sample block header, which stores the wavelet function
quantifier which encodes the sequence of non-discarded identification number N (on one byte) followed by the
values ŷ[n]. The result is a 8 bit/non-discarded
computed quanta (used by the quantifier) in a floating
number format (on four bytes). The following two fields
transform domain sample value sequence u[n], which encodes the non-zero values positions of the sequence
posses the main quantity of the remained information. u[n] in the correspondent sequence ŷ[n]. In fact the fifth
This sequence is stored.
field is the one which stores these non-zero quantified
III. THE COMPRESSED FILE STRUCTURE values in the increasing order of their indexes. The
position encoding is done by considering a 4096 length
We have seen that at the output of the bit stream in which each bit indexes a value in the
compression system presented in figure 1. we obtain the sequence ŷ[n] and is set to 1 if this value is non-zero or
to 0 if this value is zero. For a higher compression ratio N identifies the Daubechies wavelet function order (its
this data index vector (of 512 bytes length) is considered default value is 10); Px /Perr is the maximum value of the
as a 256 sample length unsigned integer sequence and is desired signal to distortion power ratio (default 100); The
encoded in the same manner, as was done for the data output is a file with the name input_file.cmp which has a
sequence. Hence we will have a new 256 bit length index data structure presented in section III;
stream (32 bytes) and a variable number of non-zero data - the decmp.exe file with the synopsys,
index vector.
decmp.exe <input_file.cmp> [<output_file.wav>]
REFERENCES