DAS9T02 - Data Reduction

Audio Data Reduction
- Why we use it
- How it works
- The current standards
- The pros and cons
│ DW-AKADEMIE │ Seite 1
>
Why we use it
Data rate of a stereo bit stream:

16bit x 48k samples/s x 2 channels = 1536 kb/s (linear PCM)
• Produces high costs

¾ storage
¾ transmission
• Too much for

¾ digital broadcasting
¾ larger computer networks
¾ the internet
>
Comparison of data rates
storage time Linear PCM MPEG MPEG MPEG

(stereo) 1.5Mb/s 384kb/s 256kb/s 128kb/s
reduction factor 1 4 6 12
1 hour 0.7 GB 170 MB 120 MB 60 MB
1 day 16 GB 4 GB 2.8 GB 1.4 GB
1 week 120 GB 30 GB 20 GB 10 GB
1 month 500 GB 120 GB 80 GB 40 GB

1 year 6 TB 1.5 TB 1 TB 500GB
10 years 60 TB 15 TB 10 TB 5 TB
Capacity requirements
Audio linear PCM MPEG

(48kHz, 24bit) (256kb/sec)
• 1 hour stereo signal 1GB 120MB
• HD of 500GB 500 h 4000 h

• low cost server 2000 GB 2000 h 16 000 h
• large server 5 TB 5 000 h 40 000 h
• very large server 10 TB 10 000 h 80 000 h
>
Capacity requirements
Video CCIR 601 MPEG 2 MPEG 2

SDI 270Mb/s ML 24Mb/s DVD 4Mb/s
• 1 hour 120 GB 11 GB 2 GB
• HD of 500GB 4h 45 h 250 h
• low cost server 2000 GB 16 h 180 h 1000 h
• large server 5TB 40 h 450 h 2 500 h
• very large server 10 TB 80 h 900 h 5 000 h
• mass storage system 1 PB 8 000 h 80 000 h 500 000 h
>
The basic ideas of data reduction
- Digital audio contains REDUNDANCY
(meaningless data)
- e.g. 00000000000000000000000
could be expressed as ”23 x 0”
- 010111
- achieved through mathematical procedures
information
digital audio data

20 - 80%
(linear PCM)
redundancy
>
Compressing Audio Files
- Example of eliminating redundancy

in a audio file sound.wav
¾ The original linear PCM file (48kHz, 24bit)
(e.g. in .WAV format) 100MB
⇓
¾ “Zipped” file using lossless
data compression sound.zip
70MB
⇓
sound.wav
¾ “Un-zipped” file, (48kHz, 24bit)
identical to the original 100MB
>
The basic ideas of data reduction
- Digital audio contains IRRELEVANCY (inaudible information)

- Irrelevancy is subjective and strongly depends on the signal itself
relevant information 5 - 25%
information
irrelevant information
redundant data
>
Irrelevancy in audio signals
- Any inaudible sound event is irrelevant

- signals below the quiescent threshold
PCM noise
inaudible
audible
signal
signal
f
>
Frequency masking
- Any inaudible sound event is irrelevant

- signals below the quiescent threshold
- signals masked by a neighbouring signal
A
masking threshold
f
>
Frequency masking
- listen to this example

750Hz
NBN 900Hz NBN 750Hz

t
f
>
Temporal masking
- low level signals after a loud sound event will not be audible
- post-masking
masking threshold
masker masked signal
t
100ms
>
Temporal masking
Listen to this example

- Two short bursts of noise
The first burst is masked by the previous music
Temporal masking
low level signals shortly before a loud sound event will remain inaudible
- pre-masking
masking threshold
masker
masked signal
t
20ms
>
Temporal masking
Listen to this
- The second burst is pre-masked by successive music
>
Data reduction encoding
- Dividing the audio band into critical bands

- 32 up to 576 sub-bands
- using filter banks and/or
modified discrete cosine transform (MDCT)
linear filter bank sub-band

PCM or MCDT signals
>
- Fourier analysing of a time interval of the signal

- between 256 up to 1024 point fft
- causing delay of 8 to 24ms
linear filter bank

PCM or MCDT
Fast
Fourier
description of a time block
Transform in the frequency domain
- Comparison of the signal with the

psycho-acoustical model
- the psycho-acoustic model is not standardised
it can be improved in future developments
linear filter bank

PCM or MCDT
Fast psycho- information about the

Fourier acoustical relevant critical bands and
Transform model their coding requirements
- Distribution of the bit budget to the sub-bands

- considers the bit-rate available
- assigns to each sub-band an optimum
number of bit
linear filter bank

PCM or MCDT bit budget
per sub-band
Fast psycho- information

bit about the total
Fourier acoustical
allocation
Transform model bit rate
- Quantisation of the sub-band samples

- every sub-band is quantised with the assigned bits
- a scale factor is extracted from the largest signal
sub-band
linear coding
filter bank
PCM or MCDT
scale factor
extraction
Fast psycho-
bit
Fourier acoustical
allocation
Transform model
- Coding and bit packing

- serialisation of the sub-band information
- entropy coding
- inserting scale factor and side information
sub-band
linear coding coding
filter bank
and
PCM or MCDT
scale factor bit packing
extraction
Fast psycho-
bit
Fourier acoustical
allocation
Transform model
>
Data reduction standard
- MPEG: international standard by ISO

- AC2, AC3: by Dolby Inc.
- ATRAC: Sony, used for Minidisc only
- Ogg Vorbis: Open standard, patent free data reduction by Xiph.org
- WMA: Windows proprietary audio file format
- RealAudio: audio/video format especially for streaming
- G7XX: for low delay audio over ISDN lines (7kHz bandwidth)
>
The MPEG standards
- Moving Picture Expert Group

- set up by the ISO
- experts and interested companies
- MPEG 1, IS-11172, October 1992
- part 3: audio coding
- MPEG 2, IS-13818, November 1994
- part 3: low sample rate audio, multi-channel audio
- part 7: Advanced Audio Coding (AAC), non-backward compatible to MPEG 1
- intended MPEG 3 included in MPEG 2
- MPEG 4, IS-14496, July 1999 (Version 2)
- adaptive coding for very low bit rates and multimedia applications
- introduces technologies like CELP and HVXC
- deals with text to speech (TTS) applications
>
The MPEG standards
- MPEG 7, IS-15938, July 2001

- multimedia objects description
- MPEG 21, IS-18034, December 2001
- multimedia framework
- strategies for content retrieval and management
>
MPEG 1
- Coding of moving picture and associated audio for digital storage media
at up to about 1.5Mb/s
- Part 3 standardised the audio compression formats
- Three Layer were standardised
- Layer 1
- Layer 2
- Layer 3
- The three layer are downward compatible to each other
>
MPEG 1
- Layer 1
- low complexity of encoder and decoder
- low compression rate ( 4 )
- relatively high bit rates (192kb/s/ch)
- developed for Philips DCC
- outdated today
Layer 1
>
MPEG 1
- Layer 2
- medium complexity of encoder and decoder
- medium compression rate ( 6 )
- moderate bit rates ( 128kb/s/ch)
- developed for DAB
- most commonly used in the studio environment
Layer 2
Layer 1
>
MPEG 1
- Layer 3
- high complexity of encoder and decoder
- high compression rate ( 12 )
- low bit rates ( 64kb/s/ch)
- designed for signal transmission (ISDN)
- all future MPEG standards are based on Layer 3
Layer 3
Layer 2
Layer 1
>
MPEG 1
Target bit rates of Layer 1, 2 and 3
Layer 3
Layer 2
Layer 1
bit rate (kb/s/ch)
32 64 96 128 160 192 224 256
24 12 8 6 5 4 3
data reduction factor (related to 16bit/48kHz)
>
MPEG Stereo Modes
- Mono
- One channel is recorded and transmitted only
- If the input signal is stereo, the encoder will build the mono sum
- Stereo (dual mono)
- This is the true stereo mode
- Two fully independent audio channels (left and right)
will be encoded and transmitted
- Joint Stereo (intensity stereo, mid-side stereo)
- The encoder will eliminate additional redundancy of stereo signals
by coding similar signals in the left and right channel only once.
- Joint stereo provides more effective use of the bit budget
and will therefore reduce artifacts in the signal
- Joint stereo produces a less clear stereo image
Data Reduction Sound Demonstration
- MPEG 1 Layer 2 encoding with different bit rates
¾ 384 kb/s dual mono compression rate 1:4

¾ 128 kb/s joint stereo compression rate 1:12
¾ 64 kb/s mono compression rate 1:12
MPEG 1
- Comparison of Layer 2 and Layer 3 features
Layer 2 Layer 3
sub-bands 32 576
entropy coding no yes
bit reservoir technology no yes
time delay 24ms appr. 100ms
>
MPEG 2
- Low sample rate audio
- reduced sample rates, reduced audio bandwidth
- reduction of audio bandwidth is less annoying than encoding artefacts
- the compression format for Worldspace satellite radio
- multi-channel applications
- 5+1 audio channels
- used for film, video and DVD application (Europe)
- Advanced Audio Coding (AAC)
- non-backward compatible to MPEG 1
- allows very low bit rates at improved quality
- is widely used for MP3 files in the internet
- the compression format for DRM
>
Problems of data reduction
- data reduced audio is not identical with the original

(it only sounds like the original)
- (inaudible) loss in sound quality
- decay of quality with “generations”
- the quality decay is not transparent
- block structure of the MPEG data
- 24ms to 100ms
- editing is not possible within the block
- delay of signal
- encoding/decoding requires time 100ms to 200ms
- problems in real time applications
>
More Problems
- Costs for data reduction

- specialised hardware or software produces extra cost
- on the receiving end a special decoder is required
>
Conclusions
- Data reduction produces high quality audio but it has its limitations
- Data reduction can be used
- to store signals more economically
- to transmit signals more economically
- to employ new transmission channels (e.g. ISDN)
- in the broadcasting environment for simple radio productions
- Data reduction should not be used
- if the signal is entitled to later sound processing
- during the production of music, drama
or any other complex audio production
- for archiving of important sound material
- if it gives no particular advantages
•

DAS9T02 - Data Reduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DAS9T02 - Data Reduction

Uploaded by

Copyright:

Available Formats

Audio Data Reduction

- The current standards

- The pros and cons

Data rate of a stereo bit stream:

• Produces high costs

• Too much for

storage time Linear PCM MPEG MPEG MPEG

1 day 16 GB 4 GB 2.8 GB 1.4 GB

1 month 500 GB 120 GB 80 GB 40 GB

Audio linear PCM MPEG

• 1 hour stereo signal 1GB 120MB

• HD of 500GB 500 h 4000 h

Video CCIR 601 MPEG 2 MPEG 2

digital audio data

- Example of eliminating redundancy

- Digital audio contains IRRELEVANCY (inaudible information)

relevant information 5 - 25%

- Any inaudible sound event is irrelevant

- Any inaudible sound event is irrelevant

- listen to this example

NBN 900Hz NBN 750Hz

masker masked signal

Listen to this example

The first burst is masked by the previous music

- Dividing the audio band into critical bands

linear filter bank sub-band

- Fourier analysing of a time interval of the signal

linear filter bank

- Comparison of the signal with the

linear filter bank

Fast psycho- information about the

- Distribution of the bit budget to the sub-bands

linear filter bank

Fast psycho- information

- Quantisation of the sub-band samples

- Coding and bit packing

- MPEG: international standard by ISO

- Moving Picture Expert Group

- MPEG 7, IS-15938, July 2001

Target bit rates of Layer 1, 2 and 3

- MPEG 1 Layer 2 encoding with different bit rates

¾ 384 kb/s dual mono compression rate 1:4

- Comparison of Layer 2 and Layer 3 features

entropy coding no yes

bit reservoir technology no yes

time delay 24ms appr. 100ms

- data reduced audio is not identical with the original

- Costs for data reduction

You might also like