Mpeg 4

MPEG-4 AUDIO STANDARD
1. ng B Vit
2. Ng t
3. Bi nh Phc
11DT3
11DT3
11DT2
BROAD OUTLINE
INTRODUCTION
FUNDAMENTALS
MPEG-4 GENERAL AUDIO CODING
PERFORMANCE MEASURES
EVALUATION
CONCLUSION
INTRODUCTION
New kind of audio coding standard

Can be grouped into two basic kinds
General audio (GA) coding
Synthetic audio
We focus on general audio coding
GA coding: taking PCM audio streams and effectively encoding
them for transmission and storage
FUNDAMENTALS
Psychoacoustics
Frequency masking
Threshold of hearing
Critical bands
Bark units
Temporal masking
MPEG Audio
MPEG Layers
MPEG Audio strategy
MPEG Audio compression algorithm
Bit-allocation
PSYCHOACOUSTICS
Range of human hearing: 20 Hz to 20 kHz
Voice: 500 Hz to 4 kHz
Range of hearing depends upon frequency

Fletcher-Munson equal-loudness curves
Describe relationship between perceived loudness for a given
stimulus sound volume as a function of frequency
FLETCHER-MUNSON EQUAL-LOUDNESS
CURVES
FREQUENCY MASKING
A lower tone can effectively mask a higher tone

A higher tone does not mask a lower tone well. Tones can mask
lower-frequency sounds, but not as effectively as they mask
higher frequency ones
The greater the power in the masking tone, the broader the
range of frequencies it can mask
If two tones are widely separated in frequency, little masking
occurs
THRESHOLD OF HEARING
THRESHOLD OF HEARING
[dB]
Theoriginisatthefrequencyof2kHzsinceThreshold(f)=0atf=2kHz
FREQUENCY MASKING CURVES
FREQUENCY MASKING CURVES
CRITICAL BANDS
Human hearing range naturally divides into critical bands

Human auditory system cannot resolve sounds better than
within about one critical band when other sounds are present
Critical bandwidth corresponds to the smallest frequency
difference between two partials such that each can still be heard
separately
Critical band
Less than 100 Hz at f < 500 Hz nearly constant
For f 500 Hz, increases roughly linearly with frequency
Audio frequency range for hearing can be partitioned into about

24 critical bands (25 are typically used for coding applications)
CRITICAL BANDS
BARK UNITS
The range of frequencies affected by masking is broader for

higher frequencies
It is useful to define a new frequency unit
In terms of this new unit, each of the masking curves has about
the same width
The new unit defined is called the Bark, named after Heinrich
Barkhausen (1881-1956)
One Bark unit corresponds to the width of one critical band, for
any masking frequency
BARK UNITS
The conversion between a frequency f and its corresponding
critical band number b, expressed in Bark units

Another formula:
where f is in kHz, b is in Barks
BARK UNITS
The inverse equation gives the frequency in kHz corresponding
to a particular Bark value b

The critical bandwidth (df) for a given center frequency f can
also be approximated by
where f is in kHz and df is in Hz
The idea of the Bark unit is to define a more perceptually
uniform unit of frequency, in that every critical bands width is
roughly equal in terms of Barks
TEMPORAL MASKING
Any loud tone causes the hearing receptors in the inner ear to
become saturated, and they require time to recover
It may take up to as much as 500 ms for us to discern a quiet

test tone after a 60 dB masking tone has been played
TEMPORAL MASKING
The closer the frequency to the masking tone and the closer in time to when the
masking tone is stopped, the greater likelihood that a test tone cannot be heard
TEMPORAL MASKING
The phenomenon of saturation also depends upon how long the

masking tone is applied
Solid curve: masking tone played for 200 ms

Dashed curve: masking tone played for 100 ms
TEMPORAL MASKING
Post-masking: a signal is able to mask other signals that occur

just after it sounds
Pre-masking: a particular signal can mask sounds played just
before the stronger signal
Pre-masking has a shorter effective interval (2-5 ms) than does
post-masking (usually 50-200 ms)
MPEG audio compression takes advantage of these

considerations in basically constructing a large,
multidimensional lookup table
It uses this to transmit frequency components that are masked
by frequency masking or temporal masking or both, using few
bits.
MPEG AUDIO
MPEG Audio proceeds by first applying a filter bank to the input,

to break the input into its frequency components
In parallel, it applies a psychoacoustic model to the data, and
this model is used in a bit-allocation block
Then the number of bits allocated is used to quantize the
information from the filter bank
The overall result us that quantization provides the
compression, and bits are allocated where they are most needed
to lower the quantization noise below an audible level
MPEG LAYERS
Three downward-compatible layers of audio compression

Each offers more complexity in the psychoacoustic model
applied and correspondingly better compression for a given
level of audio quality
Layer 1 quality can be quite good, provided a comparatively high
bitrate is available
Layer 2 has more complexity and was proposed for use in digital
audio broadcasting
Layer 3 is most complex and was originally aimed at audio
transmission over ISDN lines
Each of the layers uses a different frequency transform
MPEG AUDIO STRATEGY
The encoder employs a bank of filters that act to first analyze the
frequency components of the audio signal
Using a psychoacoustic model to estimate the masking
threshold level
In quantization and coding stages, the encoder balances the
masking behavior and the available number of bits by discarding
inaudible frequencies and scaling quantization according to the
sound level left over, above masking levels
MPEG AUDIO COMPRESSION ALGORITHM
Ancillary Data
(Optional)
SBS: Sub-band samples

Header contains
synchronization code (twelve 1s - 111111111111)
sampling rate used
bit-rate
stereo information
Ancillary data: i.e. multilingual data and surround-sound data

SBS format: quantized scaling factor and code-words
Filter bank: divides the input into 32 frequency sub-bands

Determine the scaling factor for each sub-band
Pass the scaling factor along with sub-band samples (SBS) to
the bit-allocation block
Determine the number of code bits to quantize the sub-band to
minimize the audibility of quantization noise
Psychoacoustic model
decides whether each frequency sub-band is tone-like or
noise-like
based on that decision and scaling factor, calculates the
masking threshold for each band and compares with the
threshold of hearing
BIT-ALLOCATION
The bit-allocation is not part of the standard, and it can therefore

be done in many possible ways
Ensure that all the quantization noise is below the masking
threshold
BIT-ALLOCATION
Being built around the coder kernel provided by MPEG-2

Advanced Audio Coding (AAC)
With some additional coding tools and coder configurations
The perceptual noise substitution (PNS) tool and the Long-Term
Prediction (LTP) tool are available to enhance the coding
performance for the noise-like and very tonal signal, respectively
A special coder kernel (Twin VQ) is provided to cover extremely
low bitrates
A flexible bitrate scalable coding system is defined including a
variety of possible coder configurations
LONG TERM PREDICTION
Long-term prediction (LTP) is an efficient tool for reducing the

redundancy of a signal between successive coding frames
newly introduced in MPEG-4
LTP tool provides optimal coding gain for stationary harmonic
signals as well as some gain for non-harmonic tonal signals
Compared with the rather complex MPEG-2 AAC predictor tool,
the LTP tool shows a saving of roughly one-half in both
computational complexity and memory requirements
LONG TERM PREDICTION
PERCEPTUAL NOISE SUBSTITUTION (PNS)
A feature newly introduced into MPEG-4 GA

Aiming at a further optimization of the bit-rate efficiency of AAC
at lower bit-rates
The technique of PNS is based on the observation that one
noise sounds like the other
Additional decoder complexity associated with the PNS coding
tool is very slow in terms of both computational and memory
requirements
PERCEPTUAL NOISE SUBSTITUTION
TWIN VQ
To increase coding efficiency for coding of musical signals at

very low bit-rates
The basic idea is to replace the conventional encoding of scalefactors and spectral data used in MPEG-4 AAC by an interleaved
vector quantization applied to a normalized spectrum
The input signal vector (spectral coefficients) is interleaved into
sub-vectors. These sub-vectors are quantized using vector
quantizers. TwinVQ can achieve a higher coding efficiency at the
cost of always creating a minimum amount of loss in terms of
audio quality
TWIN VQ
MPEG-4 SCALABLE GA CODING
Enable the transmission and decoding of the bit-stream with a

bit-rate that can be adapted to dynamically varying requirements
Offer significant advantages for transmitting content over
channels with a variable channel capacity or connections for
which the available channel capacity is unknown, at the time of
encoding
MPEG-4 SCALABLE GA CODING
Base layer: transmits the most relevant components of the

signal at a basic level quality
Enhancement layers: enhance the coding precision delivered by
the preceding layers
MPEG-4 SCALABLE AUDIO CODER
Base layer coder operates at a lower sampling frequency than

the enhancement layer coder
CONCLUSION
Based in part on MPEG-2 AAC, in part on conventional speech

coding technology, and in part in new methods, the MPEG-4
General Audio coder provides a rich set of tools and features to
deliver both enhanced coding performance and provisions for
various types of scalability. The MPEG-4 GA coding defines the
current state of the art in perceptual audio coding

Mpeg 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mpeg 4

Uploaded by

Copyright:

Available Formats

MPEG-4 AUDIO STANDARD

New kind of audio coding standard

Range of human hearing: 20 Hz to 20 kHz

Voice: 500 Hz to 4 kHz

Range of hearing depends upon frequency

A lower tone can effectively mask a higher tone

FREQUENCY MASKING CURVES

FREQUENCY MASKING CURVES

Human hearing range naturally divides into critical bands

Audio frequency range for hearing can be partitioned into about

The range of frequencies affected by masking is broader for

The conversion between a frequency f and its corresponding

critical band number b, expressed in Bark units

The inverse equation gives the frequency in kHz corresponding

to a particular Bark value b

It may take up to as much as 500 ms for us to discern a quiet

The phenomenon of saturation also depends upon how long the

Solid curve: masking tone played for 200 ms

Post-masking: a signal is able to mask other signals that occur

MPEG audio compression takes advantage of these

MPEG Audio proceeds by first applying a filter bank to the input,

Three downward-compatible layers of audio compression

MPEG AUDIO STRATEGY

MPEG AUDIO COMPRESSION ALGORITHM

MPEG AUDIO COMPRESSION ALGORITHM

MPEG AUDIO COMPRESSION ALGORITHM

SBS: Sub-band samples

Ancillary data: i.e. multilingual data and surround-sound data

MPEG AUDIO COMPRESSION ALGORITHM

Filter bank: divides the input into 32 frequency sub-bands

The bit-allocation is not part of the standard, and it can therefore

MPEG-4 GENERAL AUDIO CODING

Being built around the coder kernel provided by MPEG-2

MPEG-4 GENERAL AUDIO CODING

LONG TERM PREDICTION

Long-term prediction (LTP) is an efficient tool for reducing the

LONG TERM PREDICTION

PERCEPTUAL NOISE SUBSTITUTION (PNS)

A feature newly introduced into MPEG-4 GA

PERCEPTUAL NOISE SUBSTITUTION

To increase coding efficiency for coding of musical signals at

MPEG-4 SCALABLE GA CODING

Enable the transmission and decoding of the bit-stream with a

MPEG-4 SCALABLE GA CODING

Base layer: transmits the most relevant components of the

MPEG-4 SCALABLE AUDIO CODER

Base layer coder operates at a lower sampling frequency than

Based in part on MPEG-2 AAC, in part on conventional speech

You might also like