Professional Documents
Culture Documents
An Overview of Perceptual Audio Coding and MPEG AAC
An Overview of Perceptual Audio Coding and MPEG AAC
Introduction
Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction i.e. generating output audio that cannot distinguished from the original input even by a listener with Golden Ears The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.
Continue
MPEG audio compression standards are lossy audio coding standards. They try to compress audio by trying to reduce perceptual and statistical redundancies. The basic task of a perceptual audio coding system is to compress the digital audio data in a way that - the compression is as high as possible, and
- the reconstructed (decoded) audio sounds exactly (or as close as possible) to the original audio before compression
Psychoacoustic Principles
High-precision engineering models for highfidelity audio currently do not exist. So, audio coding algorithms rely upon generalized receiver models to optimize coding efficiency. In the case of audio, the receiver is ultimately the human ear and sound perception is affected by its masking properties. Perceptual audio coders achieve compression by exploiting the fact that irrelevant signal information is not detectable by even a well trained or sensitive listener.
Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles, including absolute hearing thresholds, critical band frequency analysis, simultaneous masking, the spread of masking along the basilar membrane, and temporal masking. By combining all these, a quantitative estimate of the fundamental limit of transparent audio signal compression i.e. Perceptual Entropy is determined for given audio frame.
Perceptual entropy denotes minimum number of bits which should be allocated to a given audio frame to represent perceptually lossless audio.
3.64(f/1000)-0.8 -
-0.6(f/1000-3.3)2 6.5e
When applied to signal compression, it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain. So using this information the noise levels during quantization are tried to fit below this threshold. Due to this quantization noise does not become audible.
However
The detection threshold for spectrally complex quantization noise is a modified version of the absolute threshold, with its shape determined by the stimuli present at any given time. Since stimuli are in general time-varying, the detection threshold is also a time-varying function of the input signal. A Spreading function helps to determine modified detection threshold of hearing in presence of stimuli in given audio frame.
Critical Bands
Human ear can be viewed as a discrete set of band pass filters, which covers the entire 20kHz frequency range. The inner ear called as Cochlea contains frequency sensitive positions. Whenever any tone enters the cochlea it moves until it reaches the position where it resonates. (Works as spectrum analyzer) The critical bandwidth is a function of frequency that quantifies the cochlear filter pass bands. (unit Bark)
As the center frequency goes on increasing, the barkwidth also goes on increasing. Spectral analysis of audio content is performed using critical bands.
Bark-width with center frequency f is gives as BWc(f) = 25 + 75(1 + 1.4(f/100)2)0.69 Hz To convert frequency in Hz to Bark Z(f) = 13 arctan(0.00076f) + 3.5 arctan(f/7500)2 (Bark)
Masking
Masking refers to a process where one sound is rendered inaudible because of the presence of another sound Simultaneous Masking (Frequency domain)
Relative shapes of the masker and maskee magnitude spectra determine extent of masking
Depending on the behavior of masker and maskee there are following cases :
Noise Masking Tone (NMT) Tone Masking Noise (TMN) Noise Masking Noise (NMN)
We can see the asymmetry of masking power between noise and tonal maskers. Significantly greater masking power is associated with noise maskers than with tonal masker.
Spread of Masking
Masker centered within one critical band has some predictable effect on detection thresholds in other critical bands. This effect, also known as the spread of masking, It is often modeled in coding applications by an approximately triangular spreading function
Layer 3 uses a 2-stage filter, more frequency resolution and improved Huffman Coding to the basic perceptual coder principle
AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filterbank, non-uniform quantization, Huffman coding, iteration loop structure using analysis by-synthesis), but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates. Its popularity is currently maintained by it being the default iTunes codec, the media player which powers iPod, the most popular digital audio player on the market. Furthermore, the iTunes Music Store, whose sales account for 85% of the market for legal online downloads, sells AAC-encoded songs (encapsulated with FairPlay Digital Rights Management)
Both the mid/side coding and the intensity coding are more flexible, allowing to apply them to reduce the bitrate more frequently. An optional backward prediction, computed line by line, achieves better coding efficiency especially for very tone-like signals. This feature is only available within the rarely used main profile. Improved Huffman Coding : In AAC, coding by quadruples of frequency lines applied more often. In addition, the assignment of Huffman code tables to coder partitions can be much more flexible. AAC and HE-AAC are far better than MP3 at very low bitrates, but at medium to higher bitrates the two formats are more comparable
Modular encoding
AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, implementers may create profiles to define which of a specific set of tools they want use for a particular application. The standard offers four default profiles: Low Complexity (LC) - the simplest and most widely used and supported; Main Profile (MAIN) - like the LC profile, with the addition of backwards prediction; Sample-Rate Scalable (SRS), a.k.a. Scalable Sample Rate (MPEG-4 AAC-SSR); Long Term Prediction (LTP); added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity. Depending on the AAC profile and the MP3 encoder, 96 kbit/s AAC can give nearly the same or better perceptional quality as 128 kbit/s MP3
MPEG-HE AAC
HE-AAC is the low bit rate codec in the AAC family and is a combination of the AAC LC (Advanced Audio Coding Low Complexity) audio coder and the SBR (Spectral Band Replication) bandwidth expansion tool. This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s. HE-AAC is also known as aacPlus and can be used in multichannel operations.
MPEG-4 HE-AAC v2
Combined with parametric stereo, the HEAAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content. HE-AAC v2 is also known as aacPlus v2.
Rough work
Explain basic psychoacoustic principles Absolute threshold of hearing, Critical bands, Phenomenon of masking Simultaneous, Masking asymmetry, Spread of masking, Nonsimultaneous, Perceptual Entropy MPEG audio codec family mp3, mp2 AAC, mp4 AAC, advanced AAC plus version 1, advanced AAC plus version 2 (mention features present/absent in each)
Limitations of mp3 What is different in AAC ? Features in AAC Explain each feature in detail (mp2, mp4)