Audio Compressrion1 PDF

Audio Compression
Techniques
Lecture 8
Prepared by
Razia Nisar Noorani
1
Introduction
 Digital Audio Compression
 Removal of redundant or otherwise irrelevant
information from audio signal
 Audio compression algorithms are often referred to as
“audio encoders”
 Applications
 Reduces required storage space
 Reduces required transmission bandwidth
2
Audio Compression
 Audio signal – overview
 Sampling rate (# of samples per second)
 Bit rate (# of bits per second). Typically,
uncompressed stereo 16-bit 44.1KHz signal has a
1.4MBps bit rate
 Number of channels (mono / stereo / multichannel)
 Reduction by lowering those values or by data
compression / encoding
3
Audio Data Compression
 Redundant information
 Implicit
in the remaining information
 Ex. oversampled audio signal
 oversampling is the process of sampling a signal with a
sampling frequency significantly higher than twice the
bandwidth or highest frequency of the signal being sampled
 Irrelevant information
 Perceptuallyinsignificant
 Cannot be recovered from remaining information
4
 Lossless Audio Compression
 Removes redundant data
 Resulting signal is same as original – perfect
reconstruction
 Lossy Audio Encoding
 Removes irrelevant data
 Resulting signal is similar to original
5
 Audio vs. Speech Compression
Techniques
 Speech Compression uses a human vocal
tract model to compress signals
 Audio Compression does not use this
technique due to larger variety of possible
signal variations
6
Generic Audio Encoder
 Psychoacoustic Model
 Psychoacoustics – study of how sounds are
perceived by humans
 Uses perceptual coding
 eliminate information from audio signal that is
inaudible to the ear
 Detectsconditions under which different audio
signal components mask each other
7
Psychoacoustic Model
 Signal Masking
 Threshold cut-off
 Spectral (Frequency / Simultaneous) Masking
 Temporal Masking
 Threshold cut-off and spectral masking
occur in frequency domain, temporal
masking occurs in time domain
8
Signal Masking
 Threshold cut-off
 Hearing threshold
level – a function of
frequency
 Any frequency
components below the
threshold will not be
perceived by human
ear
9
Signal Masking
 Spectral Masking
A frequency
component can be
partly or fully masked
by another component
that is close to it in
frequency
 This shifts the hearing
threshold
10
Signal Masking
 Temporal Masking
A quieter sound can
be masked by a louder
sound if they are
temporally close
 Sounds that occur
both (shortly) before
and after volume
increase can be
masked
11
Spectral Analysis
 a device or algorithm that identifies a
frequency domain representation of a
time domain signal.
 Tasks of Spectral Analysis
 To derive masking thresholds to determine which
signal components can be eliminated
 To generate a representation of the signal to which
masking thresholds can be applied
 Spectral Analysis is done through transforms or
filter banks
12
Spectral Analysis
 Transforms
 Fast Fourier Transform (FFT)
 Discrete Cosine Transform (DCT) - similar to
FFT but uses cosine values only
 Modified Discrete Cosine Transform (MDCT)
[used by MPEG-1 Layer-III, MPEG-2 AAC,
Dolby AC-3] – overlapped and windowed
version of DCT
13
Spectral Analysis
 Filter Banks
 a filter bank is an array of band-pass filters that
separates the input signal into multiple
components, each one carrying a single
frequency subband of the original signal
 Time sample blocks are passed through a set of
bandpass filters
 Masking thresholds are applied to resulting frequency
subband signals
 Poly-phase and wavelet banks are most popular filter
structures 14
Filter Bank Structures
 Polyphase Filter Bank
[used in all of the MPEG-1 encoders]
 Signal is separated into subbands, the widths
of which are equal over the entire frequency
range
 The resulting subband signals are
downsampled to create shorter signals (which
are later reconstructed during decoding
process)
15
Filter Bank Structures
 Wavelet Filter Bank
[used by Enhanced Perceptual Audio
Coder (EPAC) by Lucent]
 Unlike polyphase filter, the widths of the
subbands are not evenly spaced (narrower for
higher frequencies)
 This allows for better time resolution (ex. short
attacks), but at expense of frequency
resolution
16
Noise Allocation
 System Task: derive and apply shifted hearing
threshold to the input signal
 Anything below the threshold doesn’t need to be
transmitted
 Any noise below the threshold is irrelevant
 Frequency component quantization
 Tradeoff between space and noise
 Encoder saves on space by using just enough bits for
each frequency component to keep noise under the
threshold - this is known as noise allocation
17
Noise Allocation
 Pre-echo
 In case a single audio block contains silence followed
by a loud attack, pre-echo error occurs - there will be
audible noise in the silent part of the block after
decoding
 This is avoided by pre-monitoring audio data at
encoding stage and separating audio into shorter
blocks in potential pre-echo case
 This does not completely eliminate pre-echo, but can
make it short enough to be masked by the attack
(temporal masking)
18
Additional Encoding Techniques
 Other encoding techniques techniques are
available (alternative or in combination)
 Predictive Coding
 Coupling / Delta Encoding
 Huffman Encoding
19
 Predictive Coding
 Often used in speech and image compression
 Estimates the expected value for each sample based
on previous sample values
 Transmits/stores the difference between the expected
and received value
 Generates an estimate for the next sample and then
adjusts it by the difference stored for the current
sample
 Used for additional compression in MPEG2 AAC
(Advance audio Coding)
20
 Coupling / Delta encoding
 Used in cases where audio signal consists of two or
more channels (stereo or surround sound)
 Similarities between channels are used for
compression
 A sum and difference between two channels are
derived; difference is usually some value close to zero
and therefore requires less space to encode
 This is a case of lossless encoding process
21
 Huffman Coding
 Information-theory-based technique
 An element of a signal that often reoccurs in the
signal is represented by a simpler symbol, and its
value is stored in a look-up table
 Implemented using a look-up tables in encoder and in
decoder
 Provides substantial lossless compression, but
requires high computational power and therefore is
not very popular
 Used by MPEG1 and MPEG2 AAC
22
Encoding - Final Stages
 Audio data packed into frames
 Frames stored or transmitted
23
Questions
24

Audio Compressrion1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio Compressrion1 PDF

Uploaded by

Copyright:

Available Formats

Audio Compression

You might also like