An Overview of Perceptual Audio Coding and MPEG AAC

Introduction
‡ Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. ‡ The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction i.e. generating output audio that cannot distinguished from the original input even by a listener with ´Golden Ears´ ‡ The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.

Continue «
‡ MPEG audio compression standards are lossy audio coding standards. They try to compress audio by trying to reduce perceptual and statistical redundancies. The basic task of a perceptual audio coding system is to compress the digital audio data in a way that - the compression is as high as possible, and
- the reconstructed (decoded) audio sounds exactly (or as close as possible) to the original audio before compression

Audio Coding Techniques
‡ Parametric Coding ‡ Waveform Coding Time Domain PCM, DPCM, ADPCM etc. Frequency Domain Transform Coding, Subband Coding ‡ Hybrid Coding

Perceptual Audio Coding Basics ‡ Human hearing limited to values lower than ~20kHz in most cases ‡ Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components ‡ Stereo audio streams contain largely redundant information ‡ MPEG audio compression takes advantage of these facts to reduce extent and detail of mostly inaudible frequency ranges .

Generic Perceptual Audio Coding Architecture .

‡ In the case of audio. the receiver is ultimately the human ear and sound perception is affected by its masking properties.Psychoacoustic Principles ‡ High-precision engineering models for highfidelity audio currently do not exist. ‡ Perceptual audio coders achieve compression by exploiting the fact that ³irrelevant´ signal information is not detectable by even a well trained or sensitive listener. So. . audio coding algorithms rely upon generalized receiver models to optimize coding efficiency.

critical band frequency analysis. and temporal masking. including absolute hearing thresholds. a quantitative estimate of the fundamental limit of transparent audio signal compression i. simultaneous masking.e. ‡ By combining all these.‡ Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles. the spread of masking along the basilar membrane. . Perceptual Entropy is determined for given audio frame.

‡ Perceptual entropy denotes minimum number of bits which should be allocated to a given audio frame to represent µperceptually lossless¶ audio. .

3)2 6. It can be expressed with a non-linear function.5e .8 - -0.Absolute Threshold of Hearing ‡ The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment. Tq(f) = + 10-3(f/1000)4 (dB SPL) 3.64(f/1000)-0.6(f/1000-3.

.

‡ Due to this quantization noise does not become audible. . it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain.‡ When applied to signal compression. ‡ So using this information the noise levels during quantization are tried to fit below this threshold.

with its shape determined by the stimuli present at any given time. ‡ Since stimuli are in general time-varying. ‡ A Spreading function helps to determine modified detection threshold of hearing in presence of stimuli in given audio frame. the detection threshold is also a time-varying function of the input signal.However « ‡ The detection threshold for spectrally complex quantization noise is a modified version of the absolute threshold. .

.

Whenever any tone enters the cochlea it moves until it reaches the position where it resonates. which covers the entire 20kHz frequency range. ‡ The inner ear called as ´Cochlea´ contains frequency sensitive positions. (unit ± Bark) . (Works as spectrum analyzer) ‡ The ³critical bandwidth´ is a function of frequency that quantifies the cochlear filter pass bands.Critical Bands ‡ Human ear can be viewed as a discrete set of band pass filters.

the barkwidth also goes on increasing.5 arctan(f/7500)2 (Bark) . ‡ Spectral analysis of audio content is performed using critical bands.4(f/100)2)0.00076f) + 3.‡ As the center frequency goes on increasing.69 Hz To convert frequency in µHz¶ to µBark¶ « Z(f) = 13 arctan(0. Bark-width with center frequency µf¶ is gives as « BWc(f) = 25 + 75(1 + 1.

Figure: Idealized critical band filter bank .

.Masking Masking refers to a process where one sound is rendered inaudible because of the presence of another sound ‡ Simultaneous Masking (Frequency domain) Relative shapes of the masker and maskee magnitude spectra determine extent of masking ‡ Non-simultaneous Masking (Time domain) Phase relationships between masker and maskee determine masking outcome.

Depending on the behavior of masker and maskee there are following cases : ‡ Noise Masking Tone (NMT) ‡ Tone Masking Noise (TMN) ‡ Noise Masking Noise (NMN) .

Significantly greater masking power is associated with noise maskers than with tonal masker.Noise Masking Tone Tone Masking Noise We can see the asymmetry of masking power between noise and tonal maskers. .

Difference between SMR. NMR and SNR .

also known as the spread of masking. This effect. It is often modeled in coding applications by an approximately triangular spreading function ‡ .Spread of Masking ‡ Masker centered within one critical band has some predictable effect on detection thresholds in other critical bands.

Non-simultaneous Masking (Temporal Masking) .

MPEG Audio Codec Family ‡ ‡ ‡ ‡ ‡ ‡ MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2) MPEG-1 Layer 3 (mp3) MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC MPEG-4 HE AAC MPEG-4 HE AAV v2 .

MP3 Compression Flow Chart .

QMF Filter bank MDCT Filter bank Layer 3 uses a 2-stage filter. more frequency resolution and improved Huffman Coding to the basic perceptual coder principle .

160. 256 and 320 kbit/s. 64. 144. 22. although 192 kbit/s is becoming increasingly popular over peer-to-peer file sharing networks. 96. 64. 80. ‡ In MPEG-2 and [the non-official] MPEG-2. and the available sampling frequencies are 32.Bit rates available : ‡ In MPEG-1 Layer 3 are 32. 192. 40. 80. 128. 32. 44. 12. 48.1 kHz is almost always used (coincides with the sampling rate of compact discs). 112.025. 56. 160 kbit/s while providing lower sampling frequencies (8.5 include some additional bit rates: 8. 224. 40. and 128 kbit/s has become the de facto "good enough" standard. 112. 128. 96.1 and 48 kHz. 16.05 and 24 kHz) . 24. 56. 48. 44. 11. 16.

5/15. causing some smearing of percussive sounds ‡ Frequency resolution is limited by the small long block window size.8 kHz ‡ Joint stereo is done on a frame-to-frame basis ‡ Encoder/decoder overall delay is not defined. However. decreasing coding efficiency ‡ No scale factor band for frequencies above 15. .Design limitations of MP3 There are several limitations inherent to the MP3 format that cannot be overcome by using a better encoder. In technical terms. which means lack of official provision for gapless playback. some encoders such as LAME can attach additional metadata that will allow players that are aware of it to deliver gapless playback. a well-tuned MP3 encoder can perform competitively even with these restrictions. Newer audio compression formats such as Vorbis and AAC no longer have these limitations. ‡ Nevertheless. MP3 is limited in the following ways: ‡ Bitrate is limited to a maximum of 320 kbit/s ‡ Time resolution can be too low for highly transient signals.

‡ Not backward compatible with other MPEG audio standards (like mp3) . It was developed with the cooperation and contributions of companies mainly including Dolby. AT&T. lossy digital audio compression scheme.Advanced Audio Coding (AAC) ‡ It is a standardized. Sony and Nokia. and was officially declared an international standard by the Moving Pictures Experts Group in April of 1997. Fraunhofer (FhG).

but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates. iteration loop structure using analysis by-synthesis).‡ AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. the most popular digital audio player on the market. ‡ Furthermore. Huffman coding. sells AAC-encoded songs (encapsulated with FairPlay Digital Rights Management) . whose sales account for 85% of the market for legal online downloads. the iTunes Music Store. non-uniform quantization. the media player which powers iPod. ‡ Its popularity is currently maintained by it being the default iTunes codec. ‡ AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filterbank.

AAC's improvements over MP3 ‡ Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) ‡ Up to 48 channels ‡ Higher efficiency and simpler filterbank (hybrid pure MDCT) ‡ Higher coding efficiency for stationary signals (blocksize: 576 1024 samples) ‡ Higher coding efficiency for transient signals (blocksize: 192 128 samples) ‡ Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe ‡ Much better handling of frequencies above 16 kHz ‡ More flexible joint stereo (separate for every scale band) .

This feature is only available within the rarely used main profile. but at medium to higher bitrates the two formats are more comparable . In addition. ‡ An optional backward prediction. achieves better coding efficiency especially for very tone-like signals. computed line by line. ‡ AAC and HE-AAC are far better than MP3 at very low bitrates. allowing to apply them to reduce the bitrate more frequently. coding by quadruples of frequency lines applied more often. the assignment of Huffman code tables to coder partitions can be much more flexible.‡ Both the mid/side coding and the intensity coding are more flexible. ‡ Improved Huffman Coding : In AAC.

96 kbit/s AAC can give nearly the same or better perceptional quality as 128 kbit/s MP3 . Scalable Sample Rate (MPEG-4 AAC-SSR). ‡ Sample-Rate Scalable (SRS). ‡ Long Term Prediction (LTP). added in the MPEG-4 standard . a.a. ‡ Depending on the AAC profile and the MP3 encoder.like the LC profile. with the addition of backwards prediction. ‡ Main Profile (MAIN) .Modular encoding AAC takes a modular approach to encoding.an improvement of the MAIN profile using a forward predictor with lower computational complexity.k. Depending on the complexity of the bitstream to be encoded. implementers may create profiles to define which of a specific set of tools they want use for a particular application. The standard offers four default profiles: ‡ Low Complexity (LC) . the desired performance and the acceptable output.the simplest and most widely used and supported.

MPEG-2 AAC Flowchart .

MPEG AAC Family .

used for low bitrates. a. .added in MPEG-4 Part 3. aacPlus v1 or AAC+ . ‡ Long Term Predictor (LTP) .the combination of Parametric Stereo (PS) and HE-AAC. ‡ HE-AAC v.k. ‡ High Efficiency AAC (HE-AAC).the combination of SBR (Spectral Band Replication) and AAC.2.a. ‡ Perceptual Noise Substitution (PNS). a.Extensions and Improvements Some extensions have been added to the original AAC standard: ‡ MPEG-4 Scalable To Lossless (SLS).k. aacPlus v2 .a.

with bit rates up to 256 kbit/s. g. ‡ The low computational demands make AAC the ideal codec for any low bit rate high-quality audio application. and with support for up to 48 channels. ‡ With sampling rates ranging from 8kHz up to 96kHz and above. MPEG AAC is one of the most flexible audio codecs. 5. it fulfills the requirements for broadcast quality as defined by the European Broadcasting Union. Reaching perceptually transparent quality at only 64 kbit/s per channel. the standard also supports mono.1 or 7. . and all common multi-channel configurations (e.1). stereo.MPEG AAC Performance ‡ MPEG AAC provides excellent audio quality. Of course.

‡ This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s.MPEG-HE AAC ‡ HE-AAC is the low bit rate codec in the AAC family and is a combination of the AAC LC (Advanced Audio Coding Low Complexity) audio coder and the SBR (Spectral Band Replication) bandwidth expansion tool. . HE-AAC is also known as aacPlus and can be used in multichannel operations.

‡ HE-AAC v2 is also known as aacPlus v2. .MPEG-4 HE-AAC v2 ‡ Combined with parametric stereo. the HEAAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content.

Masking asymmetry. mp4 AAC. Nonsimultaneous.Rough work « ‡ Explain basic psychoacoustic principles ± Absolute threshold of hearing. Phenomenon of masking ± Simultaneous. advanced AAC plus version 1. Spread of masking. Critical bands. Perceptual Entropy ‡ MPEG audio codec family ± mp3. mp2 AAC. advanced AAC plus version 2 (mention features present/absent in each) .

‡ ‡ ‡ ‡ Limitations of mp3 What is different in AAC ? Features in AAC Explain each feature in detail (mp2. mp4) .