Professional Documents
Culture Documents
CPUTime(ms)
50
50
0 0
0 20 40 60 80 0 20 40 60 80
Frame Number
Frame Number
CPU time required by iteration loop for pop signal CPU time required by iteration loop for pop signal
100 100
CPUTime(ms)
CPUTime(ms)
50
50
0
0 0 10 20 30 40 50 60
0 10 20 30 40 50 60 Frame Number
Frame Number
Figure 4. CPU time required by optimized iterative loop for castanet
Figure 3. Evolution of CPU time required by iterative loops for castanet signal (up) and pop signal (down)
signal (up) and pop signal (down)
Note that there is large variance between the values III. TARGET DEPENDANT OPTIMIZATIONS
from both the files, even within the same file from frame To minimize the calculation by taking the advantage of
to frame. This behaviour is not appropriate for real time DSP characteristic, TMS320C67x digital signal processor
implementation so this section describes the reduction of Library (DSPLIB) [10] is used. It includes C-callable,
time and complexity within iteration loops. According to assembly-optimized general- purpose signal-processing
the ISO/IEC 13818-7 standard [4], a frame is quantized routines. These routines are typically used in
inside the inner loop using a non-uniform quantizer for computationally intensive real-time applications where
spectral component xi to achieve x_quantized(i). optimal execution speed is critical. We have used DSPLIB
The value of x3/4 can be calculated before entering the function for autocorrelation, convolution and FFT
loops and once inside the internal loop, these previously calculation. Further optimization includes Memory section
calculated values are multiplied by 2 to the power of an for efficient use of DSP memory.
exponent. This exponent is the only value that changes TMS320C6713 has 256 Kbytes Internal RAM (IRAM)
inside the iteration loops and is easily calculated by the that is shared between program and data space. Externally
DSP. If time measured for x3/4 calculations is T0 , then we it can address up to 16M bytes of synchronous dynamic
RAM (SDRAM) memory for program and data. The
have got a total saving of ( N − 1) * T0 per frame, where N is
overall memory map of the entire application matters for
263
optimizing the code. The execution time of an instruction further improve the speed is focused on target dependant
containing an access to external memory is increased. optimisation. Results of time execution shows that with
Therefore, we have used internal memory with processes different optimizations in algorithm combined with the
involving a great amount of operations like filters, well planned memory organisation and advantage of DSP
transforms, convolution and mathematical expressions. All parallel processing, speed can be improved by the factor of
other codes are stored in SDRAM. Spreading function 2-4.
table, input and output buffer block, Huffman code book
table are also stored in external memory because they REFERENCES
require large block size of memory. [1] Andreas Spanias,Ted Painter,Venkatraman Atti, “Audio Signal
Processing and Coding”, A John Wiley & Sons, Inc., Publication.
IV. RESULTS [2] Marina Bosi, Richard E. Goldberg, “Introduction to digital audio
coding and standards” Kluwer Academic Publishers, 2003.
MPEG 2 AAC Encoder defined in [4] and optimized [3] James D. Johnston, Schuyler R. Quackenbush, Grant A. Davidson,
AAC Encoder suggested in this paper was implemented Karlheinz Brandenburg, and Jurgen Herre, “MPEG Audio Coding”,
using TMS320C6713 DSK Kit. To evaluate the AT&T Laboratories – Research, Florham Park, NJ.
[4] ISO/IEC JTC1/SC29/WG11, Coding of moving pictures and audio
performance of coder, six different audio files have been – MPEG-2 Advanced Audio Coding, ISO/IEC 13818-7
chosen. Fig. 5 to 10 shows the comparison of time taken to International standard, 1997.
encode given files into AAC using standard algorithm and [5] Jürgen Herre, “Temporal Noise Shaping, Quantization And Coding
optimized algorithm at different bit rates on DSP kit. Methods In Perceptual Audio Coding”, Fraunhofer Institute for
Integrated Circuits FhG-IIS A, Erlangen, Germany.
From the figures it can be deduced that with different [6] Britanak, V.; Rao, K.R, “An efficient implementation of the
optimizations suggested here, time execution and forward and inverse MDCT in MPEG audio coding”, IEEE Trans.
complexity of MPEG AAC Encoder is reduced. on Signal Processing, Vol. 8, Issue 2, pp. 48 – 51, Feb. 2001.
[7] Chen J., Tai H. M., “MPEG-2 AAC coder on a fixed-point DSP”,
V. CONCLUSIONS Consumer Electronics, pp. 24 – 25, 1999.
[8] S.-H. Park, Y.-S. Seo, S.-W. Kim, Fast algorithm on MPEG/Audio
We have presented in this paper an optimized subband filtering, AES preprint at the 99th Convention, 1995
implementation of MPEG AAC Encoder. To answer the October.
[9] TMS320C6713 DSK Technical Reference.
challenge of real time implementation of MPEG AAC [10] SPRU657, “TMS320C67x DSP Library Programmer’s Reference
encoder, we have presented speed optimizations over most Guide”.
computationally intensive blocks namely psychoacoustic [11] SPRU187B, “TMS320C6x Optimizing C Compiler User’s Guide”.
model, MDCT and Iterative loops. The later effort to
(in minutes)
10
Time elapse
20
20 8
(in minutes)
Time elapse
15
6
10 10
4
5 0 2
0 64 96 128 0
64 96 128 64 96 128
Bit rate
Bit rate Bit rate
unoptimized optimized unoptimized optimized unoptimized optimized
Figure 5. Hip hop Timing comparison graph Figure 6. Pop Timing comparison graph Figure 7. Instrumental Timing comparison graph
(in minutes)
Time elapse
20
Time elapse
minutes)
10 10 15
(in
10
5 5
5
0 0
0
64 96 128 64 96 128
64 96 128
Bit rate Bit rate
Bit rate
unoptimized optimized unoptimized optimized unoptimized optimized
Figure 8. Castanet Timing comparison graph Figure 9. Trance Timing comparison graph Figure 10. Rap Timing comparison graph
264