You are on page 1of 3

2010 International Conference on Advances in Computer Engineering

Different Speed optimizations for MPEG AAC


Encoder
Ms. Vishakha N. Gor Mr. Jignesh N. Sarvaiya
ECE Dept., SVNIT Assistant Professor, ECE Dept., SVNIT
Ichchanath, surat, India Ichchanath, surat, India
vishakhagor@gmail.com jns@eced.svnit.ac.in

Abstract- The MPEG Advanced Audio Coding (AAC)


A. Psychoacoustic Model optimisation
algorithm is an example for a modern audio coding algorithm
which is equipped with a number of coding tools, delivers Main challenge for implementation of psychoacoustic
unsurpassed audio quality at rates at or below 64 model on DSP processor is calculation of complex
kbps/channel. AAC defined by ISO/IEC standard is very slow functions. In order to implement the complex function
and it was impossible to perform real time encoding at any efficiently the idea is to transform the entire complex
platform. Thus in this paper we carry out algorithm level functions into only logarithms and corresponding power
platform independent optimizations of most time consuming functions. So the change is made in calculation the tonality
blocks of MPEG AAC Encoder. These optimizations will be
relevant to real time implementation of MPEG AAC Encoder.
indices by Spectral Flatness Measure (SFM) [2] instead of
Experimental results show the optimized encoder can save the Unpredictability defined in [4]. All other steps are
the computational complexity and improve the speed. same as defined in standard ISO/IEC 13818-7 [4].
Calculation of Tonality index based on spectrum flatness
measure (SFM) [4] is given by
Keywords- Advanced Audio Coding (AAC), filter bank,
MDCT, psychoacoustic model. ⎛G ⎞
SFM = 10 log ⎜ m ⎟ (1)
I. INTRODUCTION ⎝ Am ⎠
Block diagram of AAC Encoder is shown in Fig. 1. The For each partition b, denoting absolute value of FFT
details of the AAC standard and its earlier evolution are line as P, Equ. 1 can be rewritten and becomes simpler
available in [4]. with logarithm as follow:
b P (1) P (2)...P (b )
SFM (b ) = 10 log10 (2)
⎡ P (1) + P (2) + ...P (b) ⎤
⎣⎢ b ⎦⎥
⎧ log10 P (1) + log10 P (2).... + log10 P (b)
= 10 ⎨
⎩ b (3)
− ⎡⎣log10 ( P (1) + P (2) + ....P (b) ) − log10 b ⎤⎦ }
Figure 1. Block diagram of MPEG AAC Encoder [1] Tonality index is given by following equation which is
referred from [2]
MPEG audio coder defined in standard ISO/IEC SFM (b )
13818-7 [4] requires the great amount of calculation for tb(b) = (4)
60
psychoacoustic model [1], MDCT [3] and quantization Fig. 2 compares the tonality index from standard
module. This complexity makes it difficult for real time approach and using SFM.
implementation. First half of this paper covers platform
Tonality Index calculation defined in standard (solid)
independent optimization of algorithm. Second part and using SFM (dotted)
discusses platform dependant optimization based on 1

TMS320C6713 DSP [8] kit and finally result comparison


of optimized and un-optimized MPEG AAC Encoder on
0.5
DSP kit is shown.

II. PLATFORM INDEPENDENT OPTIMIZATIONS


0
This section covers the optimisation over 0 10 20 30 40 50 60
Sub-Band Number
Psychoacoustic Model, MDCT and Iterative loops [4].
Figure 2. Tonality index calculation comparison

978-0-7695-4058-0/10 $26.00 © 2010 IEEE 262


DOI 10.1109/ACE.2010.49
B. Optimisation in MDCT filter the number of inner iterations.
The analytical expression for MDCT [1] is given by Also non-uniform quantizer, which can be seen as the
N −1 ⎛ 2π 1 ⎞ cascade of a compressor of the form C ( x) = x (3/ 4) and a
X ik = 2 ∑ Z in * cos ⎜ ( n + n0 )( k + ) ⎟ uniform quantizer with a step:
n =0 ⎝N 2 ⎠ (5)
3 ( gl − scf )
where 0 ≤ k < N / 2 Δ = 2 16 (6)
Although the direct application of the MDCT formula In order to strongly reduce the number of operations, a
would require O(N2) [6] operations, it is possible to uniform quantizer is considered. For the uniform quantizer,
compute the same thing with only O(NlogN) [6] estimated noise power is calculated as follow:
2
complexity by recursively factorizing the computation, as 2 Δ
q = (7)
in the Fast Fourier transform (FFT). One alternative is to 12
compute MDCT via FFT combined with O(N) pre- and Then, using this equation, the estimated SNR for each
post-processing steps. scalefactor band can be written in function of the original
Step 1: "Pre-twiddle" the input samples by (complex) coefficients, the bandwidth of each scalefactor band and
− jπ n / N the step Δ :
multiplying with the factor e
3/ 4 ⎤ 2
Step 2: Perform FFT on the pre-twiddled data ∑ ⎡ x( n)
Step 3: “post twiddle” the FFT result with the factor n∈scfb ⎣ ⎦
− j 2π no ( k +1/2)/ N SNR ( scfb) = (8)
e BW ( scfb)
Above approach will speed up the calculation of 2
Δ / 12
MDCT spectrum.
C. Time reduction inside the iteration loops Comparing Fig. 3 and Fig. 4 it is clear that the mean
Fig. 3 shows the evolution of CPU time needed by the CPU time has been reduced by a factor of 2-3 for the
iteration loops to converge for pop and castanet signal. castanet signal and variance of values through the file is
strongly reduced.
CPU time required by iteration loop for castanet signal
100 CPU time required by iteration loop for castanet signal
100
CPUTime(ms)

CPUTime(ms)

50
50

0 0
0 20 40 60 80 0 20 40 60 80
Frame Number
Frame Number

CPU time required by iteration loop for pop signal CPU time required by iteration loop for pop signal
100 100
CPUTime(ms)
CPUTime(ms)

50
50

0
0 0 10 20 30 40 50 60
0 10 20 30 40 50 60 Frame Number
Frame Number
Figure 4. CPU time required by optimized iterative loop for castanet
Figure 3. Evolution of CPU time required by iterative loops for castanet signal (up) and pop signal (down)
signal (up) and pop signal (down)

Note that there is large variance between the values III. TARGET DEPENDANT OPTIMIZATIONS
from both the files, even within the same file from frame To minimize the calculation by taking the advantage of
to frame. This behaviour is not appropriate for real time DSP characteristic, TMS320C67x digital signal processor
implementation so this section describes the reduction of Library (DSPLIB) [10] is used. It includes C-callable,
time and complexity within iteration loops. According to assembly-optimized general- purpose signal-processing
the ISO/IEC 13818-7 standard [4], a frame is quantized routines. These routines are typically used in
inside the inner loop using a non-uniform quantizer for computationally intensive real-time applications where
spectral component xi to achieve x_quantized(i). optimal execution speed is critical. We have used DSPLIB
The value of x3/4 can be calculated before entering the function for autocorrelation, convolution and FFT
loops and once inside the internal loop, these previously calculation. Further optimization includes Memory section
calculated values are multiplied by 2 to the power of an for efficient use of DSP memory.
exponent. This exponent is the only value that changes TMS320C6713 has 256 Kbytes Internal RAM (IRAM)
inside the iteration loops and is easily calculated by the that is shared between program and data space. Externally
DSP. If time measured for x3/4 calculations is T0 , then we it can address up to 16M bytes of synchronous dynamic
RAM (SDRAM) memory for program and data. The
have got a total saving of ( N − 1) * T0 per frame, where N is
overall memory map of the entire application matters for

263
optimizing the code. The execution time of an instruction further improve the speed is focused on target dependant
containing an access to external memory is increased. optimisation. Results of time execution shows that with
Therefore, we have used internal memory with processes different optimizations in algorithm combined with the
involving a great amount of operations like filters, well planned memory organisation and advantage of DSP
transforms, convolution and mathematical expressions. All parallel processing, speed can be improved by the factor of
other codes are stored in SDRAM. Spreading function 2-4.
table, input and output buffer block, Huffman code book
table are also stored in external memory because they REFERENCES
require large block size of memory. [1] Andreas Spanias,Ted Painter,Venkatraman Atti, “Audio Signal
Processing and Coding”, A John Wiley & Sons, Inc., Publication.
IV. RESULTS [2] Marina Bosi, Richard E. Goldberg, “Introduction to digital audio
coding and standards” Kluwer Academic Publishers, 2003.
MPEG 2 AAC Encoder defined in [4] and optimized [3] James D. Johnston, Schuyler R. Quackenbush, Grant A. Davidson,
AAC Encoder suggested in this paper was implemented Karlheinz Brandenburg, and Jurgen Herre, “MPEG Audio Coding”,
using TMS320C6713 DSK Kit. To evaluate the AT&T Laboratories – Research, Florham Park, NJ.
[4] ISO/IEC JTC1/SC29/WG11, Coding of moving pictures and audio
performance of coder, six different audio files have been – MPEG-2 Advanced Audio Coding, ISO/IEC 13818-7
chosen. Fig. 5 to 10 shows the comparison of time taken to International standard, 1997.
encode given files into AAC using standard algorithm and [5] Jürgen Herre, “Temporal Noise Shaping, Quantization And Coding
optimized algorithm at different bit rates on DSP kit. Methods In Perceptual Audio Coding”, Fraunhofer Institute for
Integrated Circuits FhG-IIS A, Erlangen, Germany.
From the figures it can be deduced that with different [6] Britanak, V.; Rao, K.R, “An efficient implementation of the
optimizations suggested here, time execution and forward and inverse MDCT in MPEG audio coding”, IEEE Trans.
complexity of MPEG AAC Encoder is reduced. on Signal Processing, Vol. 8, Issue 2, pp. 48 – 51, Feb. 2001.
[7] Chen J., Tai H. M., “MPEG-2 AAC coder on a fixed-point DSP”,
V. CONCLUSIONS Consumer Electronics, pp. 24 – 25, 1999.
[8] S.-H. Park, Y.-S. Seo, S.-W. Kim, Fast algorithm on MPEG/Audio
We have presented in this paper an optimized subband filtering, AES preprint at the 99th Convention, 1995
implementation of MPEG AAC Encoder. To answer the October.
[9] TMS320C6713 DSK Technical Reference.
challenge of real time implementation of MPEG AAC [10] SPRU657, “TMS320C67x DSP Library Programmer’s Reference
encoder, we have presented speed optimizations over most Guide”.
computationally intensive blocks namely psychoacoustic [11] SPRU187B, “TMS320C6x Optimizing C Compiler User’s Guide”.
model, MDCT and Iterative loops. The later effort to

Pop T iming graph Instrumental T iming graph


Hip hop T iming graph
25 30
12
(in minutes)
Time elapse

(in minutes)

10
Time elapse

20
20 8
(in minutes)
Time elapse

15
6
10 10
4
5 0 2
0 64 96 128 0
64 96 128 64 96 128
Bit rate
Bit rate Bit rate
unoptimized optimized unoptimized optimized unoptimized optimized

Figure 5. Hip hop Timing comparison graph Figure 6. Pop Timing comparison graph Figure 7. Instrumental Timing comparison graph

Castanet T iming graph T rance T iming graph Rap T iming graph


15 25
15
Time elapse
(in minutes)

(in minutes)
Time elapse

20
Time elapse

minutes)

10 10 15
(in

10
5 5
5

0 0
0
64 96 128 64 96 128
64 96 128
Bit rate Bit rate
Bit rate
unoptimized optimized unoptimized optimized unoptimized optimized

Figure 8. Castanet Timing comparison graph Figure 9. Trance Timing comparison graph Figure 10. Rap Timing comparison graph

264

You might also like