Professional Documents
Culture Documents
VOCAL’s GSM speech coder optimized source code provides performance and
portability
With VOCAL’s proprietary techniques, GSM speech coders are optimized to support modern processors
including DSPs and conventional processors from AMD, Intel, ADI, TI and other vendors. Benchmarks have
shown that our highly optimized C with limited assembly code compares well against other vendors
software.
Audio / voice codecs and vocoders convert the voice signals required to be transmitted over a
GSM link into a compact digital format. Voice codec technologies used with GSM include LPC-
RPE, EFR, Full Rate, Half Rate, AMR codec and AMR-WB codec & CELP, ACELP, VSELP, speech
codec technologies.
A variety of different forms of audio codec or vocoder are available for general use, and the GSM system
supports a number of specific audio codecs. These include the RPE-LPC, half rate, and AMR codecs. The
performance of each voice codec is different and they may be used under different conditions, although the
AMR codec is now the most widely used. Also the newer AMR wideband (AMR-WB) codec is being
introduced into many areas, including GSM
Voice codec technology has advanced by considerable degrees in recent years as a result of the increasing
processing power available. This has meant that the voice codecs used in the GSM system have large
improvements since the first GSM phones were introduced.
If speech were digitised in a linear fashion it would require a high data rate that would occupy a very wide
bandwidth. As bandwidth is normally limited in any communications system, it is necessary to compress
the data to send it through the available channel. Once through the channel it can then be expanded to
regenerate the audio in a fashion that is as close to the original as possible.
To meet the requirements of the codec system, the speech must be captured at a high enough sample rate
and resolution to allow clear reproduction of the original sound. It must then be compressed in such a way
as to maintain the fidelity of the audio over a limited bit rate, error-prone wireless transmission channel.
1
VOCODER
Audio as linear prediction. In many ways this can be likened to a mathematical modelling of the human
vocal tract. To achieve this the spectral envelope of the signal is estimated using a filter technique. Even
where codecs or vocoders can use a variety of techniques, but many modern audio codecs use a technique
known signals with many non-harmonically related signals are used it is possible for voice codecs to give
very large levels of compression.
CELP: The CELP or Code Excited Linear Prediction codec is a vocoder algorithm that was originally
proposed in 1985 and gave a significant improvement over other voice codecs of the day. The basic
principle of the CELP codec has been developed and used as the basis of other voice codecs including
ACELP, RCELP, VSELP, etc. As such the CELP codec methodology is now the most widely used
speech coding algorithm. Accordingly CELP is now used as a generic term for a particular class of
vocoders or speech codecs and not a particular codec.
The main principle behind the CELP codec is that is uses a principle known as "Analysis by Synthesis".
In this process, the encoding is performed by perceptually optimising the decoded signal in a closed
loop system. One way in which this could be achieved is to compare a variety of generated bit streams
and choose the one that produces the best sounding signal.
ACELP codec: The ACELP or Algebraic Code Excited Linear Prediction codec. The ACELP codec
or vocoder algorithm is a development of the CELP model. However the ACELP codec codebooks have
a specific algebraic structure as indicated by the name.
VSELP codec: The VSELP or Vector Sum Excitation Linear Prediction codec. One of the major
drawbacks of the VSELP codec is its limited ability to code non-speech sounds. This means that it
performs poorly in the presence of noise. As a result this voice codec is not now as widely used, other
newer speech codecs being preferred and offering far superior performance.
2
VOCODER
When interoperating between networks, most mobile carriers simply convert their encoded voice to the
traditional G.711 μ-law and A-law representations even when the non-mobile subscriber is serviced by
another carrier or by a Voice over IP (VoIP) service. This results in unnecessary quality degradation which
may be considered part of the walled-garden plan. New developments in advanced wide-band speech
coders have begun to improve this situation especially when a carrier crosses the boundary between
wireless and some sort of wireline replacement service. VoIP services, such as Skype, already carry speech
signals between mobile and all other end-points without the need for additional decoding and re-
encoding.
GSM 06.01 FR Vocoder defines a reference configuration for the speech transmission chain of the digital
cellular telecommunications system. The speech encoder takes its input as a 13 bit uniform PCM signal
either from the audio part of the mobile station or on the network side, from the PSTN via an 8 bit/A-law
to 13 bit uniform PCM conversion. The encoded speech at the output of the speech encoder is delivered
to a channel encoder unit which is specified in GSM 05.03. In the receive direction, the inverse operations
take place. GSM 06.10 describes the detailed mapping between input blocks of 160 speech samples in 13
bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output
blocks of 160 reconstructed speech samples. The sampling rate is 8000 sample/s leading to an average bit
rate for the encoded bit stream of 13 kbit/s. The coding scheme is the so-called Regular Pulse Excitation -
Long Term prediction - Linear Predictive Coder, here-after referred to as RPE-LTP.
3
VOCODER
GSM 0610 also specifies the conversion between A-law PCM and 13 bit uniform PCM. Performance
requirements for the audio input and output parts are included only to the extent that they affect the
transcoder performance. GSM 06.10 also describes the codec down to the bit level, thus enabling the
verification of compliance to the recommendation to a high degree of confidence by use of a set of
digital test sequences.
4
VOCODER
output blocks of 160 reconstructed speech samples. The sampling rate is 8000 sample/s leading
to an average bit rate for the encoded bit stream of 13 kbit/s.
Applications:
WIFI phones VoWLAN .
Wireless GSM, GPRS, EGPRS, EDGE systems.
Personal Communications.
Wideband IP telephony.
Audio and Video Conferencing.
Wideband IP telephony.
Features:
Full and half duplex modes of operation.
Passes ETSI test vectors.
Common compressed speech frame stream interface to support systems with multiple speech
coders (GSM EFR, GSM HR, GSM AMR, G.723, G.728, G.729 et al).
Optimized for high performance on leading edge DSP architectures.
Multi-tasking environment compatible.
Can be integrated with G.168 and G.165 echo cancellers, and tone detection/regeneration.
Multi channel implementation.
Complain with GSM 06.10 Recommendation.
Optimized implementation.
GSM 06.20 GSM half rate codec uses the VSELP (Vector-Sum Excited Linear Prediction) algorithm. The
VSELP algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding
algorithms known as CELP (Code Excited Linear Prediction).
GSM 06.20 GSM half rate codec's encoding process is performed on a 20 ms speech frame at a time. A
speech frame of the sampled speech waveform is read and based on the current waveform and the past
history of the waveform, the codec encoder derives 18 parameters that describe it. The parameters
extracted are grouped into the following three general classes: Energy parameters (R0 and GSP0);
Spectral parameters (LPC and INT_LPC); Excitation parameters (LAG and CODE).
5
VOCODER
GSM 06.20 half rate codec is an analysis-by-synthesis codec, therefore the speech decoder is primarily a
subset of the speech encoder. The quantised parameters are decoded and a synthetic excitation is
generated using the energy and excitation parameters. The synthetic excitation is then filtered to
provide the spectral information resulting in the generation of the synthesized speech.
GSM 06.20 speech encoder takes its input as a 13 bit uniform Pulse Code Modulated (PCM) signal either
from the audio part of the MS or on the network side, from the PSTN via an 8 bit/A-law or µ-law (PCS
1900) to 13 bit uniform PCM conversion. The encoded speech at the output of the speech encoder is
delivered to the channel coding function as defined in GSM 05.03 [3] to produce an encoded block
consisting of 228 bits leading to a gross bit rate of 11,4 kbit/s. In the RX direction, the inverse operations
take place.
GSM 06.20 describes the detailed mapping between input blocks of 160 speech samples in 13 bit
uniform PCM format into encoded blocks of 112 bits and from encoded blocks of 112 bits to output
blocks of 160 reconstructed speech samples. The sampling rate is 8 000 sample/s leading to an average
bit rate for the encoded bit stream of 5,6 kbit/s. The coding scheme is called Vector Sum Excited Linear
Prediction (VSELP) coding.
6
VOCODER
Applications:
WIFI phones VoWLAN.
Wireless GSM, GPRS, EGPRS, EDGE systems.
Personal Communications.
Wideband IP telephony .
Audio and Video Conferencing .
Wideband IP telephony.
Features:
Full and half duplex modes of operation.
Passes ETSI test vectors.
Common compressed speech frame stream interface to support systems with multiple speech
coders (GSM FR, GSM EFR, GSM HR, G.723, G.728, G.729 et al).
Optimized for high performance on leading edge DSP architectures.
Multi-tasking environment compatible.
Can be integrated with G.168 and G.165 echo cancellers, and tone detection/regeneration.
Multi channel implementation.
Complain with GSM 06.20 Recommendation.
Optimized implementation.