Advanced Audio Coding

‡ ‡ ‡ Audio coding or audio compression algorithms are used to obtain compact digital representation of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with minimum number of bits while achieving transparent signal reproduction The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for Standardization (ISO) standard for high- fidelity audio compression.

It was developed with the cooperation and contributions of companies mainly including Dolby. AT&T. and was officially declared an international standard by the Moving Pictures Experts Group in April of 1997. Fraunhofer (FhG). Sony and Nokia.Advanced Audio Coding (AAC) ‡ It is a standardized. lossy digital audio compression scheme. Not backward compatible with other MPEG audio standards (like mp3) ‡ .

the most popular digital audio player on the market. sells AAC-encoded songs. AAC follows the same basic coding paradigm as Layer-3 (high frequency resolution filter bank. Furthermore. Huffman coding. whose sales account for 85% of the market for legal online downloads. the media player which powers iPod. the iTunes Music Store. iteration loop structure using analysis by-synthesis). . non-uniform quantization. Its popularity is currently maintained by it being the default iTunes codec.‡ ‡ ‡ ‡ AAC was promoted as the successor to MP3 for audio coding at medium to high bitrates. but improves on Layer-3 in a lot of details and uses new coding tools for improved quality at low bit-rates.

AAC's improvements over MP3 ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) Up to 48 channels Higher efficiency and simpler filterbank (hybrid pure MDCT) Higher coding efficiency for stationary signals (blocksize: 576 1024 samples) Higher coding efficiency for transient signals (blocksize: 192 128 samples) Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe Much better handling of frequencies above 16 kHz More flexible joint stereo (separate for every scale band) .

‡ Improved Huffman Coding : In AAC. achieves better coding efficiency especially for very tone-like signals. computed line by line. ‡ An optional backward prediction.‡ Both the mid/side coding and the intensity coding are more flexible. ‡ AAC and HE-AAC are far better than MP3 at very low bitrates. coding by quadruples of frequency lines applied more often. This feature is only available within the rarely used main profile. the assignment of Huffman code tables to coder partitions can be much more flexible. In addition. allowing to apply them to reduce the bit-rate more frequently. but at medium to higher bitrates the two formats are more comparable .

MPEG Audio Codec Family ‡ ‡ ‡ ‡ ‡ ‡ MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2) MPEG-1 Layer 3 (mp3) MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC MPEG-4 HE AAC MPEG-4 HE AAV v2 .

Evolution from MPEG-2 AAC LC to MPEG-4 AAC LC to HE-AACv2 MPEG-2 (ISO/IEC 13818-3) AAC MPEG-4 (ISO/IEC 14496-3) AAC MPEG-4 HE AAC MPEG-4 HE AAV v2 .

sychoacoustic model only uses frequency masking. takes into account stereo redundancy. because it provides good quality at an acceptable bit rate. layer 3 is the most widely deployed audio coding method (known as MP3). Layer 3 was considered too complex to be practically useful. Layer 3 (MP3): Layer 1 filter bank followed by MDCT per band to obtain nonuniform frequency division similar to critical bands. Psychoacoustic model includes temporal masking effects. Layer 2: (Musicam or MUSICAM) Same filter bank as layer 1. At the time of MPEG1 audio development (finalized 1992). Psychoacoustic model uses a little bit of the temporal masking. ‡ But today. and uses Huffman coder. It is also because the code for layer 3 is distributed freely.Mepeg layers ‡ Layer 1: DCT type filter with equal frequency spread per band. .

Basic idea for the coding technique ‡ Decompose a signal into separate frequency bands by using a filter bank ‡ Analyze signal energy in different bands and determine the total masking threshold of each band because of signals in other band/time ‡ Quantize samples in different bands with accuracy proportional to the masking level 1) Any signal below the masking level does not need to be coded 2)Signal above the masking level are quantized with a quantization step size according to masking level and bits are assigned across bands so that each additional bit provides maximum reduction in perceived distortion. .

Audio Coding Basics ‡ Human hearing limited to values lower than ~20kHz in most cases ‡ Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components ‡ Stereo audio streams contain largely redundant information ‡ MPEG audio compression takes advantage of these facts to reduce extent and detail of mostly inaudible frequency ranges .

Masking what is Masking : Masking refers to a process where one sound is rendered inaudible because of the presence of another sound .

Basic structure of audio Encoder and Decoder How does it works ? .

Reproducing the waveform of the original input audio signal with a minimum amount of data while considering all the psychoacoustic principles to minimize the audibility of coding effects. The main tools for the AAC 1)Modified Discrete Cosine Transform (MDCT): 1)filter bank using window switching: 2) Transforming the signal into a spectral representation is the key to apply psychoacoustic principles .Advanced Audio Coding The perceptually-oriented traditional waveform codecs. Other well-known representatives of this of codecs are 1) MPEG-1/2 Layer 2. Redundancy problems reduction for audio content. . by tool to shape quantization noise in the time domain by running a prediction across frequency on the spectral data 4)Quantization and Coding : Improves compression efficiency . 3)Temporal Noise Shaping (TNS) : This avoids undesirable effects caused by the relatively coarse time resolution of the MDCT filter bank . 2)Stereo Processing : To increase the compression efficiency for stereo signals. MPEG-1/2 Layer 3 (mp3) and 2) Dolby AC-3 . by using tools to quantize and code the spectrum by AAC bit stream syntax.

For regenerating the missing high-frequency components. ‡ The main tools are : 1)High Frequency Reconstruction: Transposer or Generator -.addition of missing sinusoids generates a the 2)Envelope Adjustment: upper spectrum generated by the transposer needs to be shaped subsequently with respect to frequency and time . Constructor -. . SBR regenerates it from the lower part with the help of some low-bit rate guidance data.upper part of the spectrum by copying and shifting the lower part of the transmitted spectrum.AAC Spectral Band Replication (SBR): ‡ Bandwidth extension technology is based on the observation for the purpose of improved compression . ‡ Instead of transmitting the upper part of the spectrum with AAC. To generate the highfrequency spectrum. using a QMF (Quadrature Mirror Filter) filter bank analysis/synthesis system.

AAC ‡ Parametric Stereo (PS): 1) joint coding of stereo audio : just a mono-downmix is transmitted. PREDECTIVE CODING: Forward prediction : The correlation between subsequent input samples is exploited by quantizing /coding the prediction error based on the unquantized input samples. along with a small data stream describing it becomes to up mix in the decoder Noiseless Coding : This is done by a lossless packing of quantized spectral data exploiting statistical dependencies and other properties. Backward prediction :This scheme is also known as opposed to the more widely used which comprises a prediction based on previously quantized values ‡ ‡ ‡ . To achieve a further gain in required data rate by reduction of redundancy in the representation of the transmitted data.

Diagram for the predictive analysis .

AAC Compression .

k. ‡ High Efficiency AAC (HE-AAC).Extensions and Improvements Some extensions have been added to the original AAC standard: ‡ MPEG-4 Scalable To Lossless (SLS). a. .a.k. aacPlus v2 .the combination of Parametric Stereo (PS) and HE-AAC. used for low bitrates. ‡ HE-AAC v. aacPlus v1 or AAC+ .the combination of SBR (Spectral Band Replication) and AAC. ‡ Perceptual Noise Substitution (PNS). ‡ Long Term Predictor (LTP) .2.a. a.added in MPEG-4 Part 3.

Architecture of HE-AAC .

Block Diagram of MPEG-2 AAC .

MPEG-HE AAC ‡ HE-AAC is the low bit rate codec in the AAC family and is a combination of the ‡ AAC LC (Advanced Audio Coding Low Complexity) audio coder and SBR (Spectral Band Replication) bandwidth expansion tool. . HE-AAC is also known as AAC Plus and can be used in multichannel operations. ‡ This combination achieves good stereo quality already at bit rates of 32 to 48 kbit/s.

Diagram for the HE AAC v2 .

‡ HE-AAC v2 is also known as AAC Plus v2. .MPEG-4 HE-AAC v2 ‡ Combined with parametric stereo. the HE-AAC codec provides good audio quality starting at bit rates around 16 to 24 kbit/s for stereo content.

