You are on page 1of 36

Audio Data Reduction

- Why we use it

- How it works

- The current standards

- The pros and cons

│ DW-AKADEMIE │ Seite 1
>
Why we use it

Data rate of a stereo bit stream:


16bit x 48k samples/s x 2 channels = 1536 kb/s (linear PCM)

• Produces high costs


¾ storage
¾ transmission

• Too much for


¾ digital broadcasting
¾ larger computer networks
¾ the internet

│ DW-AKADEMIE │ Seite 2
>
Comparison of data rates

storage time Linear PCM MPEG MPEG MPEG


(stereo) 1.5Mb/s 384kb/s 256kb/s 128kb/s
reduction factor 1 4 6 12
1 hour 0.7 GB 170 MB 120 MB 60 MB

1 day 16 GB 4 GB 2.8 GB 1.4 GB

1 week 120 GB 30 GB 20 GB 10 GB

1 month 500 GB 120 GB 80 GB 40 GB


1 year 6 TB 1.5 TB 1 TB 500GB

10 years 60 TB 15 TB 10 TB 5 TB

│ DW-AKADEMIE │ Seite 3
Capacity requirements

Audio linear PCM MPEG


(48kHz, 24bit) (256kb/sec)

• 1 hour stereo signal 1GB 120MB

• HD of 500GB 500 h 4000 h


• low cost server 2000 GB 2000 h 16 000 h
• large server 5 TB 5 000 h 40 000 h
• very large server 10 TB 10 000 h 80 000 h

│ DW-AKADEMIE │ Seite 4
>
Capacity requirements

Video CCIR 601 MPEG 2 MPEG 2


SDI 270Mb/s ML 24Mb/s DVD 4Mb/s

• 1 hour 120 GB 11 GB 2 GB

• HD of 500GB 4h 45 h 250 h
• low cost server 2000 GB 16 h 180 h 1000 h
• large server 5TB 40 h 450 h 2 500 h
• very large server 10 TB 80 h 900 h 5 000 h
• mass storage system 1 PB 8 000 h 80 000 h 500 000 h

│ DW-AKADEMIE │ Seite 5
>
The basic ideas of data reduction
- Digital audio contains REDUNDANCY
(meaningless data)
- e.g. 00000000000000000000000
could be expressed as ”23 x 0”
- 010111
- achieved through mathematical procedures

information

digital audio data


20 - 80%

(linear PCM)
redundancy
│ DW-AKADEMIE │ Seite 6
>
Compressing Audio Files

- Example of eliminating redundancy


in a audio file sound.wav
¾ The original linear PCM file (48kHz, 24bit)
(e.g. in .WAV format) 100MB


¾ “Zipped” file using lossless
data compression sound.zip
70MB


sound.wav
¾ “Un-zipped” file, (48kHz, 24bit)
identical to the original 100MB

│ DW-AKADEMIE │ Seite 7
>
The basic ideas of data reduction

- Digital audio contains IRRELEVANCY (inaudible information)


- Irrelevancy is subjective and strongly depends on the signal itself

relevant information 5 - 25%

information
irrelevant information

redundant data

│ DW-AKADEMIE │ Seite 8
>
Irrelevancy in audio signals

- Any inaudible sound event is irrelevant


- signals below the quiescent threshold

PCM noise

inaudible
audible
signal
signal

f
│ DW-AKADEMIE │ Seite 9
>
Frequency masking

- Any inaudible sound event is irrelevant


- signals below the quiescent threshold
- signals masked by a neighbouring signal
A
masking threshold

f
│ DW-AKADEMIE │ Seite 10
>
Frequency masking

- listen to this example


750Hz

NBN 900Hz NBN 750Hz


t

f
│ DW-AKADEMIE │ Seite 11
>
Temporal masking

- low level signals after a loud sound event will not be audible
- post-masking

masking threshold

masker masked signal

t
100ms
│ DW-AKADEMIE │ Seite 12
>
Temporal masking

Listen to this example


- Two short bursts of noise

The first burst is masked by the previous music

│ DW-AKADEMIE │ Seite 13
Temporal masking

low level signals shortly before a loud sound event will remain inaudible
- pre-masking

masking threshold

masker
masked signal

t
20ms
│ DW-AKADEMIE │ Seite 14
>
Temporal masking

Listen to this
- The second burst is pre-masked by successive music

│ DW-AKADEMIE │ Seite 15
>
Data reduction encoding

- Dividing the audio band into critical bands


- 32 up to 576 sub-bands
- using filter banks and/or
modified discrete cosine transform (MDCT)

linear filter bank sub-band


PCM or MCDT signals

│ DW-AKADEMIE │ Seite 16
>
Data reduction encoding

- Fourier analysing of a time interval of the signal


- between 256 up to 1024 point fft
- causing delay of 8 to 24ms

linear filter bank


PCM or MCDT

Fast
Fourier
description of a time block
Transform in the frequency domain

│ DW-AKADEMIE │ Seite 17
Data reduction encoding

- Comparison of the signal with the


psycho-acoustical model
- the psycho-acoustic model is not standardised
it can be improved in future developments

linear filter bank


PCM or MCDT

Fast psycho- information about the


Fourier acoustical relevant critical bands and
Transform model their coding requirements

│ DW-AKADEMIE │ Seite 18
Data reduction encoding

- Distribution of the bit budget to the sub-bands


- considers the bit-rate available
- assigns to each sub-band an optimum
number of bit

linear filter bank


PCM or MCDT bit budget
per sub-band

Fast psycho- information


bit about the total
Fourier acoustical
allocation
Transform model bit rate

│ DW-AKADEMIE │ Seite 19
Data reduction encoding

- Quantisation of the sub-band samples


- every sub-band is quantised with the assigned bits
- a scale factor is extracted from the largest signal

sub-band
linear coding
filter bank
PCM or MCDT
scale factor
extraction

Fast psycho-
bit
Fourier acoustical
allocation
Transform model

│ DW-AKADEMIE │ Seite 20
Data reduction encoding

- Coding and bit packing


- serialisation of the sub-band information
- entropy coding
- inserting scale factor and side information

sub-band
linear coding coding
filter bank
and
PCM or MCDT
scale factor bit packing
extraction

Fast psycho-
bit
Fourier acoustical
allocation
Transform model

│ DW-AKADEMIE │ Seite 21
>
Data reduction standard

- MPEG: international standard by ISO


- AC2, AC3: by Dolby Inc.
- ATRAC: Sony, used for Minidisc only
- Ogg Vorbis: Open standard, patent free data reduction by Xiph.org
- WMA: Windows proprietary audio file format
- RealAudio: audio/video format especially for streaming
- G7XX: for low delay audio over ISDN lines (7kHz bandwidth)

│ DW-AKADEMIE │ Seite 22
>
The MPEG standards

- Moving Picture Expert Group


- set up by the ISO
- experts and interested companies
- MPEG 1, IS-11172, October 1992
- part 3: audio coding
- MPEG 2, IS-13818, November 1994
- part 3: low sample rate audio, multi-channel audio
- part 7: Advanced Audio Coding (AAC), non-backward compatible to MPEG 1
- intended MPEG 3 included in MPEG 2
- MPEG 4, IS-14496, July 1999 (Version 2)
- adaptive coding for very low bit rates and multimedia applications
- introduces technologies like CELP and HVXC
- deals with text to speech (TTS) applications

│ DW-AKADEMIE │ Seite 23
>
The MPEG standards

- MPEG 7, IS-15938, July 2001


- multimedia objects description
- MPEG 21, IS-18034, December 2001
- multimedia framework
- strategies for content retrieval and management

│ DW-AKADEMIE │ Seite 24
>
MPEG 1

- Coding of moving picture and associated audio for digital storage media
at up to about 1.5Mb/s
- Part 3 standardised the audio compression formats
- Three Layer were standardised
- Layer 1
- Layer 2
- Layer 3
- The three layer are downward compatible to each other

│ DW-AKADEMIE │ Seite 25
>
MPEG 1

- Layer 1
- low complexity of encoder and decoder
- low compression rate ( 4 )
- relatively high bit rates (192kb/s/ch)
- developed for Philips DCC
- outdated today

Layer 1

│ DW-AKADEMIE │ Seite 26
>
MPEG 1

- Layer 2
- medium complexity of encoder and decoder
- medium compression rate ( 6 )
- moderate bit rates ( 128kb/s/ch)
- developed for DAB
- most commonly used in the studio environment

Layer 2

Layer 1

│ DW-AKADEMIE │ Seite 27
>
MPEG 1

- Layer 3
- high complexity of encoder and decoder
- high compression rate ( 12 )
- low bit rates ( 64kb/s/ch)
- designed for signal transmission (ISDN)
- all future MPEG standards are based on Layer 3

Layer 3

Layer 2

Layer 1

│ DW-AKADEMIE │ Seite 28
>
MPEG 1

Target bit rates of Layer 1, 2 and 3

Layer 3

Layer 2

Layer 1
bit rate (kb/s/ch)
32 64 96 128 160 192 224 256

24 12 8 6 5 4 3
data reduction factor (related to 16bit/48kHz)
│ DW-AKADEMIE │ Seite 29
>
MPEG Stereo Modes
- Mono
- One channel is recorded and transmitted only
- If the input signal is stereo, the encoder will build the mono sum
- Stereo (dual mono)
- This is the true stereo mode
- Two fully independent audio channels (left and right)
will be encoded and transmitted
- Joint Stereo (intensity stereo, mid-side stereo)
- The encoder will eliminate additional redundancy of stereo signals
by coding similar signals in the left and right channel only once.
- Joint stereo provides more effective use of the bit budget
and will therefore reduce artifacts in the signal
- Joint stereo produces a less clear stereo image

│ DW-AKADEMIE │ Seite 30
Data Reduction Sound Demonstration

- MPEG 1 Layer 2 encoding with different bit rates

¾ 384 kb/s dual mono compression rate 1:4


¾ 256 kb/s dual mono compression rate 1:6
¾ 192 kb/s dual mono compression rate 1:8
¾ 128 kb/s dual mono compression rate 1:12
¾ 128 kb/s joint stereo compression rate 1:12
¾ 96 kb/s dual mono compression rate 1:16
¾ 96 kb/s joint stereo compression rate 1:16
¾ 64 kb/s dual mono compression rate 1:24
¾ 64 kb/s joint stereo compression rate 1:24
¾ 64 kb/s mono compression rate 1:12
│ DW-AKADEMIE │ Seite 31
MPEG 1

- Comparison of Layer 2 and Layer 3 features

Layer 2 Layer 3

sub-bands 32 576

entropy coding no yes

bit reservoir technology no yes

time delay 24ms appr. 100ms

│ DW-AKADEMIE │ Seite 32
>
MPEG 2
- Low sample rate audio
- reduced sample rates, reduced audio bandwidth
- reduction of audio bandwidth is less annoying than encoding artefacts
- the compression format for Worldspace satellite radio
- multi-channel applications
- 5+1 audio channels
- used for film, video and DVD application (Europe)
- Advanced Audio Coding (AAC)
- non-backward compatible to MPEG 1
- allows very low bit rates at improved quality
- is widely used for MP3 files in the internet
- the compression format for DRM

│ DW-AKADEMIE │ Seite 33
>
Problems of data reduction

- data reduced audio is not identical with the original


(it only sounds like the original)
- (inaudible) loss in sound quality
- decay of quality with “generations”
- the quality decay is not transparent
- block structure of the MPEG data
- 24ms to 100ms
- editing is not possible within the block
- delay of signal
- encoding/decoding requires time 100ms to 200ms
- problems in real time applications

│ DW-AKADEMIE │ Seite 34
>
More Problems

- Costs for data reduction


- specialised hardware or software produces extra cost
- on the receiving end a special decoder is required

│ DW-AKADEMIE │ Seite 35
>
Conclusions

- Data reduction produces high quality audio but it has its limitations
- Data reduction can be used
- to store signals more economically
- to transmit signals more economically
- to employ new transmission channels (e.g. ISDN)
- in the broadcasting environment for simple radio productions
- Data reduction should not be used
- if the signal is entitled to later sound processing
- during the production of music, drama
or any other complex audio production
- for archiving of important sound material
- if it gives no particular advantages

│ DW-AKADEMIE │ Seite 36

You might also like