You are on page 1of 4

Enactment and Optimization of 2400 &1200bps Melpe Based on ARM-7 and ARM-9

# C.S. Mohammed Shaul Hammed Ali, # C.S. Asfia Anjum


MS (VLSI & ESD), #Btech ECE.
JNTU Hyderabad, #Global College of Engineering
hammedcs@gmail.com
these instructions, a computer or processor can execute
AbstractCurrent paper introduces basic principles of
them so as to complete the coding task.
MELPe (Enhanced Mixed-Excitation Linear Predictive), which
The instructions can also be translated to the
is an enhanced system of MELP. Amassing optimization and
structure of a digital circuit, carrying out the computation
code optimization methods will be based on ARM1176JZF-S
kernel. The encoding time of optimized algorithm drops from
directly at the hardware level. The purpose of this thesis
110.75ms per frame to 52.5ms per frame and decoding time
is to explain the theoretical issues and implementation
drops from 14.88ms per frame to 10.73ms per frame. Efficiency
techniques related to the fascinating field of speech
of encoding is increased by 52.6% and decoding is increased by
coding [14]. The topics of discussion are focused on
27.89%. Algorithm implementation platform is built based on
some of the well-established and widely used speech
the ARM architecture and Linux OS, and MELPe algorithm is
coding standards. By studying the most successful
transplanted to ARM board with S3C6410 processor. PESQ and
standards
and
understanding
their
principles,
MOS values indicate that speech quality is guaranteed. Testing
performance,
and
limitations,
it
is
possible
to
apply a
results show ARM1176JZF-S and MELPe algorithm can meet
particular technique to a given situation according to the
the real-time requirements of speech communications.
underlying constraints with the ultimate goal being the
Keywords Melpe; ARM kernel; optimization;
development of next-generation algorithms, with
implementation
improvements in all aspects.
I. INTRODUCTION
Speech coding is a procedure to represent a
digitized speech signal using as few bits as possible,
maintaining at the same time a reasonable level of speech
quality. A not so popular name having the same meaning
is speech compression. Speech coding has matured to the
point where it now constitutes an important application
area of signal processing. Due to the increasing demand
for speech communication, speech coding technology has
received augmenting levels of interest from the research,
standardization, and business communities. Advances in
microelectronics and the vast availability of low-cost
programmable processors and dedicated chips have
enabled rapid technology transfer from research to
product development; this encourages the research
community to investigate alternative schemes for speech
coding, with the objectives of overcoming deficiencies
and limitations.
The standardization community pursues the
establishment of standard speech coding methods for
various applications that will be widely accepted and
implemented by the industry. The business communities
capitalize on the ever-increasing demand and
opportunities in the consumer, corporate, and network
environments for speech processing products. Speech
coding is performed using numerous steps or operations
specified as an algorithm. An algorithm is any welldefined computational procedure that takes some value,
or set of values, as input and produces some value, or set
of values, as output. An algorithm is thus a sequence of
computational steps that transform the input into the
output. Many signal processing problems including
speech coding can be formulated as a well-specified
computational problem[11]; hence, a particular coding
scheme can be defined as an algorithm. In general, an
algorithm is specified with a set of instructions, providing
the computational steps needed to perform a task. With

MELPe algorithm based on ARM embedded system


has a good voice quality, but algorithm time complexity is
high and space and time requirements for program
running on embedded system is strict. So in order to meet
the needs of real-time speech communication, it is
necessary to optimize the embedded application to
improve the efficiency and decrease time of encoding in
actual applications. Compiler optimization and code
optimization are frequently-used optimization methods.
Compiler optimization uses parallel programming
technology to get some semantic information of the
source program by correlation analysis, and perform some
optimization automatically which has nothing to do with
the processor system, and generate high quality code by
software
pipelining,
data
planning,
circulation
reconstruction technology. Code optimization use linear
assembly to reduce instruction cycle number of algorithm.
It also uses the inline function to reduce the frequency of
function calls.
II. OVERVIEW OF MELPE ALGORITHM
MELPe vocoder derives from LPC and adopts hybrid
excitation,
aperiodic
pulses,
adaptive
spectral
enhancement, pulse shape filter and residual harmonic
processing techniques [3]. Among them, hybrid
excitation can reduce buzzer noise of synthetic speech in
LPC vocoder, especially for broadband sound source.
Aperiodic pulse can effectively remove the tones
distortion of the synthetic speech. Adaptive spectral
enhancement is used to increase the naturalness of
synthesized speech by enhancing match degree of
formant between the synthetic speech waveform and
natural voice band-pass waveform. Pulse shaping filter is
used for post processing of synthetic speech, which can
abate periodic frequency of some band signal and reduce
the peak valley ratio near pitch, thus inhibiting the buzzer
effect of synthetic speech and making speech sound

natural. Residual harmonic processing improves the


naturalness and articulation of synthetic speech, and
increases the anti-noise ability. MELPe algorithm uses
three consecutive frames to form one super-frame with
the length of 67.5ms and 540 sampling points. Each
frame of the super-frame is called sub-frame, and it lasts
for 22.5ms and contains 180 points at a sampling rate of
8000Hz [4].
III. MELPE STRUCTURE

ARM11 RISC microprocessor with AXI 64-bit bus


delivers up to 667MHz of processing performance [6].
Hardware structure on which MELPe algorithm will be
performed is shown in Fig.3. This sophisticated processor
enables the integration of various functions navigation,
camera, PDA applications into one device and it also has a
great many of external interfaces. Implement MELPe
algorithm on this processor has important significance for
the extended application

3.1 Melpe Encoder Structure


Comparing with MELP encoder, MELPe encoder
adds the noise preprocessing module, which can
effectively restrain background noise and maintain
good voice quality even in noisy environment. The
cost is to increase the algorithm complexity for noise
preprocessing computation accounts for about 31.5%
of the whole encoder. The encoder structure is shown
in Fig. 1 as below
Speech input

Quantization
Encode

Noise
pre-processor

Pitch period
calculation

Sub-band sound
intensity analysis

LPC analysis

Periodicity
judgment

Fourier

amplitudes
calculation

Fig.1. MELPe encoder block diagram

3.2 Melpe Dencoder Structure


Comparing with MELP decoder, MELPe decoder adds
the post processing filter and harmonic synthesis module.
During process of harmonic synthesis, cut-off frequency
is calculated with the band-pass voice signal strength
firstly[12]. Phase and amplitude at every point are
synthesized through pitch period and Fourier amplitude
secondly. Time domain parameter, that is the hybrid
excitation source, can be obtained through IDFT with
frequency domain parameters. Post filtering can further
improve the quality of the synthesized speech. The
decoder structure is shown in Fig. 2 as below [5].

Fig.3. System hardware block diagram

4.2 System Software Structure


Experimental platform runs on Linux system. With
high modularity, wide hardware support, open source and
excellent network file system, Linux system can build
good interactivity and steady communication system
expediently and effectively. Meanwhile, algorithm can be
realized on different hardware platform using Linux cross
compiling technology [7].
4.3 System Implementation Structure
Speech is input through audio interface and converted
into digital signal through A/D converter, and then
encoded by S3C6410. The encoded speech parameters
output through RS232 or network interface to terminal
user. Device which receives these parameters can decode
and convert them into analog signal through D/A
converter. The implementation flow is shown in Fig. 4 as
below.
Speech input

A/D

Encoder

converter
Sub-band sound

Fourier amplitudes

Cut-off frequency

intensity

Pitch periodicity

Transmission

calculation

Harmonic
synthesis

Speech output

D/A
converter

Deccoder

IDFT

Fig.4. System implementation block diagram


Speech
output

filter

Pulse shaping
filter

Gain
correction

LPC
synthesis

Adaptive spectral
enhancement

Fig.2. MELPe decoder block diagram

IV. SYSTEM PLATFORM


4.1 System Hardware Structure
Experimental platform is built based on ARM system
architecture. Encoding and decoding processes is executed
by SAMSUNG S3C6410 processor which is based on ARM
core ARM1176JZF-S. Samsung S3C6410 processor gives
designers an unbeatable combination of 3D performance

and low power in a cost-effective package. This 32-bit

V. OPTIMIZATION METHOD
5.1 Compiler Optimization
GCC compiler is widely used now. In the field of
embedded system, many compilers are created based on
the cross gcc. Its main parameters conclude: -O0 means
default mode and do not do any optimization; -O1: this is
the most basic level of optimization. Compilers will try to
generate faster and smaller code without spending too
much time. - O2: this is recommended optimization level.
Compiler will try to improve the code performance
without increasing volume and compilation time. -O3: this

is the highest level of optimization. -Os: this optimization


level is used to optimize the code size.
5.2 Code Optimization
5.2.1 Assembly language
Embedded systems often contain some key programs
which determinate the whole system performance. It can
reduce power consumption and clock frequency required
for the real time operation of the system through
optimizing these processes. Also, inline assembly does not
require external compiler, so running speed of program
can be improved and required memory space can be
decreased [8]. Since algorithm program in this paper
contains a number of loops and judgment statements, so
rewrite these sentences with assembly language can
effectively improve program efficiency.
5.2.2 Inline function
When normal function is invoked, passing parameters
and saving current program information will produce
certain overhead. The inline function is an effective
method to avoid that. In addition, if there are constants in
the inline function, compiler can use them do some
simplified operation during compilation.
VI. SPEECH QUALITY EVALUATION METHOD
6.1 Subjective Evaluation Algorithm
MOS (Mean opinion Score) mainly use five criteria to
evaluate the quality of speech which is 5;4;3;2;1 with 5
being excellent and 1 being worst. Experimenters jury
chooses one level as the result of evaluation speech
quality and the average score of all subjects is the MOS
value.

filtered, calibrated and equalized in time domain, and


converted in auditory sense; Secondly, extracting their
distortion parameters; Finally, obtaining the score of PESQ

by means of adding up the parameters in the time and


frequency domain.
Experiments show that the higher the score of PESQ,
the better voice quality. While the score of PESQ is below
2, it indicates poor speech quality and low intelligibility.
The correlation between the PESQ and subjective
evaluation method (MOS) in the speech enhancement
application is lower than in the speech coding field.
VII. EXPERIMENT AND RESULTS
7.1 Test Condition
First, sample 3//10/25 seconds speech as original input
data of experiment through ARM A/D interface at
8000Hz, 16 bits and mono sampling condition. Then, take
encoding process as example to test time-consuming of
MELPe and call frequency of internal functions on the
ARM platform as the encoding process occupies the main
operation time. Finally, contrast original speech with
decoded speech through time domain waveform and
spectrogram.
7.2 Test Result
Time consuming and function call of MELPe
algorithm are shown as below in Table II. According to
the results, coding time after optimization are much
smaller than time before optimization. Efficiency of the
algorithm is greatly improved by optimizing the code.
Time and Frame of codec is shown in Table III. Taking 3
seconds speech as example, encoding time is 2100ms and
decoding time is 440ms which means that encoding time
plus decoding time is still less than speech time, that is to
say, this algorithm can realize real-time speech
communication. It also can be seen that some major
functions account for most of algorithm computational
complexity.These functions can be further optimized
through some methods. Results in Table IV show MOS
and PESQ value drop rarely, that means speech quality
after decoding not obviously decreased.
TABLE II.

CODING TIME AND FUNCTION CALL NUMBER


Result Before
Optimization

Fig: COMPARISON OF INSTRUCTION CYCLES AND TIME


INCURRED ON DIFFERENT HARDWARE

6.2 Objective Evaluation Algorithm


An objective speech quality evaluation algorithm is the
optimal method of some mathematical criterion via
comparing processed with original speech signal, and its
performance is mainly determined by the concordance with
subjective measures [9, 10]. PESQ (Perceptual evaluation of
speech quality) is a kind of perception-domain objective
assessment algorithms based on the human auditory model.
The main idea of the PESQ algorithm: Firstly, the processed
and reference speech signal need be adjusted amplitude,

Result After
Optimization

Function
name

Call
numbers

Time
consume(s)

Call
numbers

Time
consume(s)

L_mac

11366587

0.784

11137456

0.228

L_shl

2279668

0.596

2279252

0.297

L40_mac

2981453

0.71

2970126

0.304

L_v_inner

29719

0.223

29707

L_shr

1137259

0.124

1137047

0.068

sub

2361902

0.136

2359644

0.068

L_add

1898749

0.068

1897854

0.015

cfft

98

0.221

98

0.076

0.137

TABLE III.

ENCODE AND DECODE TIME


Encode

Speech Time(s)

Decode

Frame

Time(s)

Frame

40

2.10

41

0.44

10

164

8.13

165

1.72

25

365

18.64

366

3.88

TABLE IV.
Speech Time (s)

Fig.7. Decode speech spectrogram

VIII. CONCLUSIONS

MOS AND PESQ VALUE


MOS

PESQ

4.5

3.46

10

3.5

2.68

25

3.0

2.34

Time(s)

This paper introduces the basic principles and


structure of MELPe algorithm and analyzes its advantages
by comparing with MELP. Because the algorithm
complexity is higher, so in order to achieve the
requirements of real-time communication, it is necessary
to optimize program in terms of performance. Test results
show optimized algorithm can meet needs of real-time
speech communication. Eventually, in order to realize
algorithm value in engineering application, algorithm is
transplanted to arm board based on the kernel
ARM1176JZFS and S3C6410 processor under the
premise of voice quality is guaranteed.
REFERENCES
G T.E.Tremain, The Government Standard Linear Predictive
Coding Algorithm: LPC-10, Speech Technology, pp. 40-49, April
1982.
[2] Piovesan. B, ARM11 architecture: high performance
microprocessor core, Dec. 2002.
[3] Jie Meng, System implementation of MELP speech codec based
on 1.2k, May 2012.
[4] Xiaoqun Zhao, Digital Speech Coding, China Machine Press,
pp.171-189, May 2007.
[5] Jun Tang, Jiangnan Yuan, Optimization of MELPe speech coding
algorithm based on arm9, Jan. 2012.
[6] Samsung Electronics, S3C6410X RISC Microprocessor User's
Manual, Feb. 2009.
[7] Chunyue Bi, Yunpeng Liu, Research of key technologies for
embedded Linux based on ARM, 2010 International Conference
on Computer Application and System Modeling, p373-8,2010.
[8] Hongwei Xi, Robert Harper, A dependently typed assembly
language, Proceedings of the sixth ACM SIGPLAN international
conference on Functional programming, p 169 180,2011
[9] Somek, Branko, Herceg, Speech quality assessment, 2004.
[10] Jiqing Han, Lei Zhang, Teiran Zheng, Speech Signal Processing
(second edition), April 2012.environment[J]. Consumer
Electronics, IEEE Transactions on,006,52(2):583-589.
[11] STANG-NATO 4591 FEDRAL Defence standard for 2.4kbps and
1.2kbps.
[12] McCree, A. V., L. M. Supplee, R. P. Cohn, and J. S. Collura
(1997). MELP: The New Federal Standard at 2400 bps, IEEE
[1]

Fig.5. Time domain waveform block diagram

Fig.5 shows signal time domain waveform of original


speech and decoded speech after optimization. According
to figure, there is a slight difference between two speech
time domain waveform, but the distortion is not so big
that it will not affect speech intelligibility[13].
Fig.6 and Fig.7 show original and decoded speech
spectrogram. Small difference between the two pictures is
further evidence that the codec has little effect on the
quality of speech signal.

Fig.6. Original speech spectrogram

ICASSP, pp. 15911594.


[13] Antoniou, A. (1993). Digital Filters: Analysis, Design, and
Applications, McGraw-Hill, New York.

You might also like