You are on page 1of 24

Audio Compression

Techniques
Lecture 8

Prepared by
Razia Nisar Noorani

1
Introduction
 Digital Audio Compression
 Removal of redundant or otherwise irrelevant
information from audio signal
 Audio compression algorithms are often referred to as
“audio encoders”
 Applications
 Reduces required storage space
 Reduces required transmission bandwidth

2
Audio Compression
 Audio signal – overview
 Sampling rate (# of samples per second)
 Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
 Number of channels (mono / stereo / multichannel)
 Reduction by lowering those values or by data
compression / encoding

3
Audio Data Compression
 Redundant information
 Implicit
in the remaining information
 Ex. oversampled audio signal
 oversampling is the process of sampling a signal with a
sampling frequency significantly higher than twice the
bandwidth or highest frequency of the signal being sampled
 Irrelevant information
 Perceptuallyinsignificant
 Cannot be recovered from remaining information

4
Audio Data Compression
 Lossless Audio Compression
 Removes redundant data
 Resulting signal is same as original – perfect
reconstruction
 Lossy Audio Encoding
 Removes irrelevant data
 Resulting signal is similar to original

5
Audio Data Compression
 Audio vs. Speech Compression
Techniques
 Speech Compression uses a human vocal
tract model to compress signals
 Audio Compression does not use this
technique due to larger variety of possible
signal variations

6
Generic Audio Encoder
 Psychoacoustic Model
 Psychoacoustics – study of how sounds are
perceived by humans
 Uses perceptual coding
 eliminate information from audio signal that is
inaudible to the ear
 Detectsconditions under which different audio
signal components mask each other

7
Psychoacoustic Model
 Signal Masking
 Threshold cut-off
 Spectral (Frequency / Simultaneous) Masking
 Temporal Masking
 Threshold cut-off and spectral masking
occur in frequency domain, temporal
masking occurs in time domain

8
Signal Masking
 Threshold cut-off
 Hearing threshold
level – a function of
frequency
 Any frequency
components below the
threshold will not be
perceived by human
ear

9
Signal Masking
 Spectral Masking
A frequency
component can be
partly or fully masked
by another component
that is close to it in
frequency
 This shifts the hearing
threshold

10
Signal Masking
 Temporal Masking
A quieter sound can
be masked by a louder
sound if they are
temporally close
 Sounds that occur
both (shortly) before
and after volume
increase can be
masked

11
Spectral Analysis
 a device or algorithm that identifies a
frequency domain representation of a
time domain signal.
 Tasks of Spectral Analysis
 To derive masking thresholds to determine which
signal components can be eliminated
 To generate a representation of the signal to which
masking thresholds can be applied
 Spectral Analysis is done through transforms or
filter banks
12
Spectral Analysis
 Transforms
 Fast Fourier Transform (FFT)
 Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
 Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT

13
Spectral Analysis
 Filter Banks
 a filter bank is an array of band-pass filters that
separates the input signal into multiple
components, each one carrying a single
frequency subband of the original signal
 Time sample blocks are passed through a set of
bandpass filters
 Masking thresholds are applied to resulting frequency
subband signals
 Poly-phase and wavelet banks are most popular filter
structures 14
Filter Bank Structures
 Polyphase Filter Bank
[used in all of the MPEG-1 encoders]
 Signal is separated into subbands, the widths
of which are equal over the entire frequency
range
 The resulting subband signals are
downsampled to create shorter signals (which
are later reconstructed during decoding
process)

15
Filter Bank Structures
 Wavelet Filter Bank
[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]
 Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for
higher frequencies)
 This allows for better time resolution (ex. short
attacks), but at expense of frequency
resolution

16
Noise Allocation
 System Task: derive and apply shifted hearing
threshold to the input signal
 Anything below the threshold doesn’t need to be
transmitted
 Any noise below the threshold is irrelevant
 Frequency component quantization
 Tradeoff between space and noise
 Encoder saves on space by using just enough bits for
each frequency component to keep noise under the
threshold - this is known as noise allocation

17
Noise Allocation
 Pre-echo
 In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block after
decoding
 This is avoided by pre-monitoring audio data at
encoding stage and separating audio into shorter
blocks in potential pre-echo case
 This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack
(temporal masking)

18
Additional Encoding Techniques
 Other encoding techniques techniques are
available (alternative or in combination)
 Predictive Coding
 Coupling / Delta Encoding
 Huffman Encoding

19
Additional Encoding Techniques
 Predictive Coding
 Often used in speech and image compression
 Estimates the expected value for each sample based
on previous sample values
 Transmits/stores the difference between the expected
and received value
 Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
 Used for additional compression in MPEG2 AAC
(Advance audio Coding)
20
Additional Encoding Techniques
 Coupling / Delta encoding
 Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
 Similarities between channels are used for
compression
 A sum and difference between two channels are
derived; difference is usually some value close to zero
and therefore requires less space to encode
 This is a case of lossless encoding process

21
Additional Encoding Techniques
 Huffman Coding
 Information-theory-based technique
 An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
 Implemented using a look-up tables in encoder and in
decoder
 Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
 Used by MPEG1 and MPEG2 AAC

22
Encoding - Final Stages
 Audio data packed into frames
 Frames stored or transmitted

23
Questions

24

You might also like