You are on page 1of 5

Audio Engineering Society

Convention Paper 5987


Presented at the 115th Convention
2003 October 10–13 New York, NY, USA

This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections,
or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers
may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New
York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or
any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering
Society.

Lossless Compression for Audio Data in the


IEEE Floating-Point Format
Dai Yang1 , and Takehiro Moriya1
1
NTT Cyber Space Laboratory, 3-9-11, Midori-Cho, Musashio-Shi, Tokyo 180-8585, Japan

Correspondence should be addressed to Takehiro Moriya (t.moriya@ieee.org)

ABSTRACT

Most lossless audio-coding algorithms are designed for PCM input sound formats. Little work has been
done on the lossless compression of IEEE floating-point audio files. An efficient lossless-coding algorithm
that handles IEEE floating-point format data as well as PCM format data is described in this paper. In the
worst-case scenario, where the algorithm was applied to artificially generated 48-kHz sampling frequency and
32-bit floating-point sound files, an average compression ratio of 65% was still achieved for sound files with
48kHz sampling frequency. Moreover, the proposed algorithm is easily extensible to lossless/variable-lossy
operation, which will provide scalability to accommodate the requirements of a wider range of applications
and platforms.

1. INTRODUCTION of audio encoding, such as Dolby AC-3 and


ISO/MPEG audio standards, are capable of achiev-
Current state-of-the-art ”perceptual lossy” forms ing compression ratios up to 12:1 and higher and
YANG AND MORIYA FLOATING-POINT LOSSLESS AUDIO COMPRESSION

have thus become popular as components of internet floating-point formats as well as in various PCM for-
media applications, mobile terminals, and special- mats. It has three major building blocks, i.e. for-
purpose audio-visual data storage devices. However, mat conversion, lossless PCM coding and difference
lossy audio-compression algorithms have met with coding. In our implementation, Monkey’s audio is
strong resistance from the fields of professional stu- adopted as the core coding module, i.e., for loss-
dio operations, sound archiving, and the disc-based less PCM coding. The format-conversion module
consumer market. This is because a lossy compres- handles the conversion of data between the IEEE
sion algorithm, regardless of how good it is, perma- floating-point and PCM formats. The difference-
nently alters the original recorded sound data and coding module, where the floating-point difference
thus inherently reduces audio quality. In contrast, data is precisely and efficiently coded, is the key to
copies of an audio waveform reproduced after lossless the proposed algorithm’s fully lossless quality. In de-
audio compression are always bit-by-bit identical to signing the codec, extensive experiments were per-
each other, no matter how many cycles of compres- formed to explore the trade-off between coding ef-
sion and decompression are applied. ficiency and complexity. On this basis, we selected
In October of 2002, MPEG issued a new call for and applied the best combinations of parameter sets.
proposals on MPEG-4 lossless audio coding [1, 2]. Fig. 1 illustrate the encoder block diagram. The in-
The new standard is now being developed and will be put IEEE floating-point sound file is initially con-
finalized in a year or two. However, most of available verted into PCM file M. This file is then loss-
lossless audio coding algorithms focus on audio input lessly compressed by an existing PCM lossless audio-
sources of the PCM format; little work has been done coding algorithm. Finally, the differences between
on audio sources of the IEEE floating point format. the original and PCM samples are obtained; this
As an important sound format existing in the audio data is then processed, compressed, and appended
industry, IEEE floating point sound files can store to the bitstream. The decoding process is just a
an audio signal much more precisely than sound files reverse procedure of the encoding.
with regular PCM format can, and are, therefore,
much more difficult to be losslessly compressed. 2.2. Byte-wise Difference
In this paper, we describe a new lossless audio- Instead of doing a floating-point subtraction, the dif-
coding algorithm that is capable of handling input ference between the sample in IEEE floating-point
sound files in the IEEE floating-point formats as well file A and those in file B are generated byte-wisely so
as in various PCM formats. On the encoder side, that the perfect reconstruction can always be guar-
the input IEEE floating-point sound file is initially anteed. To be more precise, we calculate the dif-
converted into a PCM file. This file is then loss- ference in the exponent D(e) and mantissa D(f ) for
lessly compressed by an existing PCM lossless audio- each sample in files A and B. They can be calculated
coding algorithm. Finally, the differences between either by bit-wise XOR or by a simple subtraction.
the original and PCM samples are obtained; this Each sample’s difference data is then stored in a 32-
data is then processed, compressed, and appended bit format.
to the bitstream.
2.3. Format Conversion
The rest of this paper is organized as follows. Sec-
tion 2 describes the proposed algorithm in detail. How the PCM and floating-point format conversion
The experimental results and some discussion are are handled has a great influence on the generation
provided in section 3. Finally, some concluding re- of difference data, which affects the overall compres-
marks are offered in section 4. sion performance in extensive ways.

2. COMPLETE ALGORITHM DESCRIPTION In most existing programs for wave file format con-
version [3], floating-point numbers are rounded to
the nearest integer when the IEEE floating-point
2.1. System Overview samples are converted to the PCM format. By do-
The proposed lossless audio-coding algorithm is ca- ing so, the magnitude of the residue is minimized.
pable of handling input sound files in the IEEE However, the sign of each sample’s residue could be

AES 115TH CONVENTION, NEW YORK, NY, USA, 2003 OCTOBER 10–13
2
YANG AND MORIYA FLOATING-POINT LOSSLESS AUDIO COMPRESSION

           !       


 ))  !  

     
 #% & %# %#
  % "!         !        ( '     
     #"        $"  

Fig. 1: Encoder block diagram.


$&%'! (& )'* +-,. /0) 2 4 3
= −1 1 • 7 J
2
       7 7
= −1 2 −23
• 7 0 7 6 0 = 111 6 00 6 6 6 0 5 2 = 2 • 1 6 1100 6 6 6 0 5
2
2
K
1
8 9 99: ; < => A E DB C ?@?@? H H
= −1 2 −23
•7 7 G I@I@I = 111 G I@I G G G I F 2 = 2 • 1 G 11I@I G G G I F
2
2

2
   "! #            
= 1− 2 = 0 = 1 − 2 = 00

c −1
LMON&M P QR S T QU S VON-W S X Y Z Y U \0M&N _ b^ a ` ` ` ]] ` ` ` ]
2 −1
≤ <2 [ = 00 0

Fig. 2: An example of new format conversion.

positive or negative. Moreover, exponents of a cer- • Exponents remain unchanged, unless trunca-
tain amount of samples’ residue could be nonzero, tion to zero occurs
making the compression of residue data inefficient
and therefore degrading the overall coding perfor- • Signs of non-zero residues are always the same
mance. In the compression system described, the
magnitude of the floating-point number f is trun- • Number of non-zero bits in D(f ) can be calcu-
cated to the largest integer number that is less than lated implicitly
or equal to f , i.e. |f | → b|f |c. When comparing the
corresponding floating-point sample in IEEE files A The disadvantage of the new format conversion
and B, i.e. fA and fB , advantages and disadvantages method is that the magnitude of the residue is
of doing so over the traditional rounding method can greater than or equal to that generated by using the
be summarized as below: traditional rounding method.
Disadvantage: As for the advantages, it is obvious that fA is always
greater than or equal to fB , and fA (E) is always the
• The average magnitude of residue is increased same as fB (E) if no zero truncation occurs, where
fA (E) and fB (E) are the exponent of samples fA
Advantage: and fB .

AES 115TH CONVENTION, NEW YORK, NY, USA, 2003 OCTOBER 10–13
3
YANG AND MORIYA FLOATING-POINT LOSSLESS AUDIO COMPRESSION

An example of how to calculate the number of ing a satisfying compression result is impossible if
nonzero bits in D(f ) is shown in Fig. 2. Suppose the residue data is handled poorly.
sample M in the intermediate 24-bit PCM file has
its magnitude equal to 7. When converted to the By using the new format conversion method, most
IEEE 32-bit floating-point format, it has the form of samples’ D(e) equal zero. Therefore, these D(e)
of (−1)s 2−23 × 7.0, where 7.0 has the binary rep- do not need to be sent to the bitstream. As for these
resentation of (111.000...0)2 . If the sample M is samples’ corresponding D(f ) values, only those pos-
converted by our new format conversion method, its sible nonzero bits need to be buffered and sent to
corresponding sample in IEEE file A should have the an entropy coder. In the situation when a sample in
form of (−1)s 2−23 × 7.xxx...x, where 7.xxx...x ≥ 7.0 the original IEEE file A has its magnitude smaller
and has the binary representation of (111.xxx...x)2 . than 2−23 (or 2−15 ), zero truncation has to be done
When byte-wise differences are performed, we notice when converting this number into a 24-bit (or 16-
that the highest 2 bits in D(f ) are zeros. Given a bit) PCM format. Thus its corresponding D(e) is
sample value in the intermediate PCM file, the num- nonzero and D(f ) may have all 23 nonzero bits.
ber of nonzero bits in D(f ) can also be calculated By using our new format conversion method, the
at the decoder side. Therefore, only those nonzero number of highest zero bits of D(f ) can be calcu-
bits of D(f ) need to be sent into the entropy coder lated. However, the remaining lower bits of D(f )
for further processing. On the contrary, if the tra- may still contain several significant zero bits. In or-
ditional format conversion is performed, the number der to further improve the performance, the highest
of nonzero bits is not predictable, and all 23 bits of nonzero byte or bit in a whole block, Nhigh−block , is
D(f ) need to be passed to the entropy coder. This calculated and sent to the bitstream. When com-
simple example can be generalized into the following pressing D(f ), only the bits that are lower than
rule. the smaller value of Nhigh−block and Nsample will be
Proposition If a sample in the 24-bit intermediate processed, where Nsample is the number of possible
PCM file has its magnitude greater than or equal nonzero bits calculated by the new format conver-
to 2n−1 , but is smaller than 2n (n > 0), then the sion method for each sample.
highest n − 1 bits in D(f ) should always be zero.
3. EXPERIMENTAL RESULTS
2.4. Intermediate PCM File
Most of the existing lossless PCM audio codecs sup- The proposed compression scheme was implemented
port input sound files of 16- and 24-bit formats. And and tested in a Windows PC. We evaluated the per-
in general, 16-bit PCM files can be more efficiently formance of the algorithm in the worst-case situation
compressed. As a major block in the compression by applying it to IEEE 32-bit floating-point format
scheme described, lossless PCM coding part has a audio source files. These files were artificially gen-
strong influence on the overall coding performance. erated by multiplying PCM sound files using differ-
Using a 16-bit PCM file as an intermediate PCM file ent gain factors. Table 1 shows compression results
format will generate a much smaller sub-bitstream I, for two sets of seven input sound files with a sam-
while increasing residue values, when compared with pling frequency of 48-kHz. The number pair G/B in
that of a 24-bit intermediate PCM file. Therefore, the first column, e.g., 0.999/24, indicates the IEEE
a trade-off exists when choosing the precision of the floating-point input file is generated by multiplying
intermediate PCM file format. a gain of 0.999 to a 24-bit PCM file. And the num-
2.5. Handling the Residue ber in column b, e.g., b=16, indicates the coding
results are obtained by using an intermediate PCM
The total bitstream is the combination of sub- file of 16-bit precision. From the figures in Table 1,
bitstreams I and D, where I is usually 15% to 30% we found for input IEEE files A that are generated
of the file size of the original IEEE 32-bit floating- from multiplying a gain to a 24-bit PCM file, the
point file A. Intermediate raw difference data is of coding performance when using a 24-bit intermedi-
the same size of the raw data in file A. Thus, achiev- ate PCM file is a little better than that for a 16-bit

AES 115TH CONVENTION, NEW YORK, NY, USA, 2003 OCTOBER 10–13
4
YANG AND MORIYA FLOATING-POINT LOSSLESS AUDIO COMPRESSION

Table 1: Compression Ratio (48kHz)


G/B b Avemaria Cymbal Etude Flute Clarinet Haffner Violin Ave
0.999/24 24 62.30% 66.43% 62.95% 65.78% 65.89% 66.38% 67.84% 65.37%
16 62.43% 70.10% 63.03% 65.86% 65.91% 66.26% 67.84% 65.92%
1.001/24 24 62.30% 66.38% 62.94% 65.77% 65.89% 66.38% 67.84% 65.35%
16 62.43% 70.09% 63.03% 65.86% 65.91% 66.26% 67.84% 65.92%
1.0/24 24 43.93% 36.48% 45.57% 44.97% 48.43% 52.44% 49.07% 45.89%
16 44.26% 41.39% 45.84% 45.29% 48.63% 52.44% 49.27% 46.73%
0.97/16 24 62.37% 63.80% 63.01% 65.70% 65.93% 66.38% 67.82% 65.00%
16 61.02% 60.43% 61.31% 64.05% 64.63% 64.89% 66.57% 63.27%
1.001/16 24 62.47% 64.79% 63.07% 65.85% 65.97% 66.39% 67.87% 65.20%
16 61.63% 50.38% 62.43% 64.65% 65.44% 66.09% 67.16% 62.54%
1.0/16 24 44.10% 39.10% 45.69% 45.04% 48.51% 52.45% 49.10% 46.28%
16 19.14% 15.13% 20.71% 20.07% 23.48% 27.32% 24.07% 21.42%

intermediate PCM file case. However, these differ- 5. REFERENCES


ences are not signficant. For those input IEEE files
[1] ISO/IEC JTC 1/SC29/WG11 N5040, Call for
A that are generated from multiplying a gain to a
proposals on MPEG-4 lossless audio coding, Kla-
16-bit PCM file, the coding performance when using
genfurt, AT, July 2002.
a 24-bit intermediate PCM file is worse than that for
a 16-bit intermediate PCM file case. Also, consider- [2] ISO/IEC JTC 1/SC29/WG11 N5208, Revised
ably large differences are observed in the case when call for proposals on MPEG-4 lossless audio cod-
an input IEEE file A is a simple format conversion ing, Shanghai, China, October 2002.
of a 16-bit PCM file.
[3] P. Kabal, AFsp Library, programs
and routines, http://www.tnt.uni-
4. CONCLUSION
hannover.de/soft/audio/packages/afsp.
[4] ANSI/IEEE Std. 754-1985 American Na-
We described a new lossless compression scheme for tional Standard, IEEE Standard for Binary
audio data in the IEEE 32-bit floating-point format. Floating-Point Arithmetic, New York, 1985.
The scheme is able to handle the lossless compression
of regular PCM input files as well. The algorithm [5] T. Moriya, A. Jin, T. Mori, K. Ikeda, and
utilizes an existing PCM lossless audio coder as a T. Kaneko, “Lossless scalable audio coder
core coder, then gracefully handles the residue data and quality enhancement,” in Proc. ICASSP,
to generate the final bitstream. In the worst case Florida, USA, May 2002.
scenario, where the input IEEE floating-point file
is generated by multiplying a non-integer number [6] Monkey Audio, A fast and powerful lossless au-
to a 24-bit PCM file, the proposed algorithm still dio compressor, http://www.moneyaudio.com.
achieves an average compression ratio of about 65% [7] M. Hans and R. W. Schafer, “Lossless compres-
for input files with a sampling frequency of 48-kHz. sion of digital audio,” IEEE Signal Processing
A compression ratio of 20% or less can be achieved Magazine, vol. 18, no. 4, pp. 21–32, 2001.
when the input IEEE floating-point file has the same
values as the 24- or 16-bit PCM file. Moreover, this
work is easily extensible to the handling of 64-bit
IEEE floating-point input source files, as well as to
a scalable lossy/lossless framework.

AES 115TH CONVENTION, NEW YORK, NY, USA, 2003 OCTOBER 10–13
5

You might also like