You are on page 1of 7

VOCODER

GSM VOICE CODEC / VOCODER

VOCAL’s GSM speech coder optimized source code provides performance and
portability

With VOCAL’s proprietary techniques, GSM speech coders are optimized to support modern processors
including DSPs and conventional processors from AMD, Intel, ADI, TI and other vendors. Benchmarks have
shown that our highly optimized C with limited assembly code compares well against other vendors
software.

Audio / voice codecs and vocoders convert the voice signals required to be transmitted over a
GSM link into a compact digital format. Voice codec technologies used with GSM include LPC-
RPE, EFR, Full Rate, Half Rate, AMR codec and AMR-WB codec & CELP, ACELP, VSELP, speech
codec technologies.

A variety of different forms of audio codec or vocoder are available for general use, and the GSM system
supports a number of specific audio codecs. These include the RPE-LPC, half rate, and AMR codecs. The
performance of each voice codec is different and they may be used under different conditions, although the
AMR codec is now the most widely used. Also the newer AMR wideband (AMR-WB) codec is being
introduced into many areas, including GSM

Voice codec technology has advanced by considerable degrees in recent years as a result of the increasing
processing power available. This has meant that the voice codecs used in the GSM system have large
improvements since the first GSM phones were introduced.

Vocoder / codec basics


Vocoders or speech codecs are used within many areas of voice communications. Obviously the focus
here is on GSM audio codecs or vocoders, but the same principles apply to any form of codec.

If speech were digitised in a linear fashion it would require a high data rate that would occupy a very wide
bandwidth. As bandwidth is normally limited in any communications system, it is necessary to compress
the data to send it through the available channel. Once through the channel it can then be expanded to
regenerate the audio in a fashion that is as close to the original as possible.

To meet the requirements of the codec system, the speech must be captured at a high enough sample rate
and resolution to allow clear reproduction of the original sound. It must then be compressed in such a way
as to maintain the fidelity of the audio over a limited bit rate, error-prone wireless transmission channel.

1
VOCODER

Audio as linear prediction. In many ways this can be likened to a mathematical modelling of the human
vocal tract. To achieve this the spectral envelope of the signal is estimated using a filter technique. Even
where codecs or vocoders can use a variety of techniques, but many modern audio codecs use a technique
known signals with many non-harmonically related signals are used it is possible for voice codecs to give
very large levels of compression.

A variety of different codec methodologies are used for GSM codecs:

 CELP: The CELP or Code Excited Linear Prediction codec is a vocoder algorithm that was originally
proposed in 1985 and gave a significant improvement over other voice codecs of the day. The basic
principle of the CELP codec has been developed and used as the basis of other voice codecs including
ACELP, RCELP, VSELP, etc. As such the CELP codec methodology is now the most widely used
speech coding algorithm. Accordingly CELP is now used as a generic term for a particular class of
vocoders or speech codecs and not a particular codec.

The main principle behind the CELP codec is that is uses a principle known as "Analysis by Synthesis".
In this process, the encoding is performed by perceptually optimising the decoded signal in a closed
loop system. One way in which this could be achieved is to compare a variety of generated bit streams
and choose the one that produces the best sounding signal.

 ACELP codec: The ACELP or Algebraic Code Excited Linear Prediction codec. The ACELP codec
or vocoder algorithm is a development of the CELP model. However the ACELP codec codebooks have
a specific algebraic structure as indicated by the name.

 VSELP codec: The VSELP or Vector Sum Excitation Linear Prediction codec. One of the major
drawbacks of the VSELP codec is its limited ability to code non-speech sounds. This means that it
performs poorly in the presence of noise. As a result this voice codec is not now as widely used, other
newer speech codecs being preferred and offering far superior performance.

GSM audio codecs / vocoders Technologies


A variety of GSM audio codecs / vocoders are supported. These have been introduced at different times,
and have different levels of performance.. Although some of the early audio codecs are not as widely used
these days, they are still described here as they form part of the GSM system.

GSM AUDIO CODECS

CODEC NAME BIT RATE COMPRESSION TECHNOLOGY


(KBPS)

Full rate 13 RTE-LPC

EFR 12.2 ACELP

Half rate 5.6 VSELP

AMR 12.2 - 4.75 ACELP

AMR-WB 23.85 - 6.60 ACELP

2
VOCODER

 GSM-FR – GSM 06.10 Full Rate Vocoder


 GSM-HR – GSM 06.20 Half Rate Vocoder
 GSM-EFR – GSM 06.60 Enhanced Full Rate Vocoder
 GSM-AMR – GSM 06.90 Adaptive Multi-Rate Vocoder
 GSM-AMR-WB – 3GPP TS 26.171 Adaptive Multi-Rate Wideband (ITU G.722.2)

GSM Voice Codecs


The mobile telephone industry has standardized a number of Global System Mobile (GSM) vocoders over
the years for use by handheld portable devices. Most of the common ones were developed by the
European Telecommunications Standard Institute (ETSI) under the GSM set of standards and then
deployed world-wide (with notable exceptions in the United States). These speech compression
algorithms were greatly limited by the processing power/battery life in early handheld devices and very
limited digital channel capacity over the air. Later speech quality/intelligibility improvements resulted from
greater signal processing capability in the handheld devices.

When interoperating between networks, most mobile carriers simply convert their encoded voice to the
traditional G.711 μ-law and A-law representations even when the non-mobile subscriber is serviced by
another carrier or by a Voice over IP (VoIP) service. This results in unnecessary quality degradation which
may be considered part of the walled-garden plan. New developments in advanced wide-band speech
coders have begun to improve this situation especially when a carrier crosses the boundary between
wireless and some sort of wireline replacement service. VoIP services, such as Skype, already carry speech
signals between mobile and all other end-points without the need for additional decoding and re-
encoding.

GSM 06.10 Full Rate (FR) Vocoder


Regular Pulse Excitation - Long Term prediction Linear Predictive Coder (RPE-
LTP)
VOCAL Technologies, Ltd. ETSI GSM 06.10 software libraries include ANSI-C and optimized assembly
code for leading silicon suppliers (ADI, ARM, DSP Group, LSI Logic ZSP, MIPS and TI). This software is
modular and can be executed as a single task under a variety of operating systems or it can execute
standalone with its own kernel.

GSM 06.01 FR Vocoder defines a reference configuration for the speech transmission chain of the digital
cellular telecommunications system. The speech encoder takes its input as a 13 bit uniform PCM signal
either from the audio part of the mobile station or on the network side, from the PSTN via an 8 bit/A-law
to 13 bit uniform PCM conversion. The encoded speech at the output of the speech encoder is delivered
to a channel encoder unit which is specified in GSM 05.03. In the receive direction, the inverse operations
take place. GSM 06.10 describes the detailed mapping between input blocks of 160 speech samples in 13
bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to output
blocks of 160 reconstructed speech samples. The sampling rate is 8000 sample/s leading to an average bit
rate for the encoded bit stream of 13 kbit/s. The coding scheme is the so-called Regular Pulse Excitation -
Long Term prediction - Linear Predictive Coder, here-after referred to as RPE-LTP.

3
VOCODER

GSM 0610 also specifies the conversion between A-law PCM and 13 bit uniform PCM. Performance
requirements for the audio input and output parts are included only to the extent that they affect the
transcoder performance. GSM 06.10 also describes the codec down to the bit level, thus enabling the
verification of compliance to the recommendation to a high degree of confidence by use of a set of
digital test sequences.

GSM 06.10 Full Rate Encoder:


1. The input speech frame, consisting of 160 signal samples (uniform 13 bit PCM samples), is first
pre-processed to produce an offset-free signal, which is then subjected to a first order
preemphasis filter. The 160 samples obtained are then analyzed to determine the coefficients
for the short term analysis filter (LPC analysis). These parameters are then used for the filtering
of the same 160 samples. The result is 160 samples of the short term residual signal. The filter
parameters, termed reflection coefficients, are transformed to log.area ratios, LARs, before
transmission. The speech frame is divided into 4 sub-frames with 40 samples of the short term
residual signal in each. Each sub-frame is processed blockwise by the subsequent functional
elements.
2. Before the processing of each sub-block of 40 short term residual samples, the parameters of
the long term analysis filter, the LTP lag and the LTP gain, are estimated and updated in the LTP
analysis block, on the basis of the current sub-block of the present and a stored sequence of the
120 previous reconstructed short term residual samples.
3. A block of 40 long term residual signal samples is obtained by subtracting 40 estimates of the
short term residual signal from the short term residual signal itself. The resulting block of 40
long term residual samples is fed to the Regular Pulse Excitation analysis which performs the
basic compression function of the algorithm.
4. As a result of the RPE-analysis, the block of 40 input long term residual samples are represented
by one of 4 candidate sub-sequences of 13 pulses each. The subsequence selected is identified
by the RPE grid position (M). The 13 RPE pulses are encoded using Adaptive Pulse Code
Modulation (APCM) with estimation of the sub-block amplitude which is transmitted to
thedecoder as side information. The RPE parameters are also fed to a local RPE decoding and
reconstruction module which produces a block of 40 samples of the quantized version of the
long term residual signal.
5. By adding these 40 quantized samples of the long term residual to the previous block of short
term residual signal estimates, a reconstructed version of the current short term residual signal
is obtained. The block of reconstructed short term residual signal samples is then fed to the long
term analysis filter which produces the new block of 40 short term residual signal estimates to
be used for the next sub-block thereby completing the feedback loop.

GSM 06.10 Full Rate Decoder:


1. The decoder includes the same structure as the feed-back loop of the encoder. In error-free
transmission, the output of this stage will be the reconstructed short term residual samples.
These samples are then applied to the short term synthesis filter followed by the de-emphasis
filter resulting in the reconstructed speech signal samples.
2. GSM 06.10 describes the detailed mapping between input blocks of 160 speech samples in 13
bit uniform PCM format to encoded blocks of 260 bits and from encoded blocks of 260 bits to

4
VOCODER

output blocks of 160 reconstructed speech samples. The sampling rate is 8000 sample/s leading
to an average bit rate for the encoded bit stream of 13 kbit/s.

Applications:
 WIFI phones VoWLAN .
 Wireless GSM, GPRS, EGPRS, EDGE systems.
 Personal Communications.
 Wideband IP telephony.
 Audio and Video Conferencing.
 Wideband IP telephony.

Features:
 Full and half duplex modes of operation.
 Passes ETSI test vectors.
 Common compressed speech frame stream interface to support systems with multiple speech
coders (GSM EFR, GSM HR, GSM AMR, G.723, G.728, G.729 et al).
 Optimized for high performance on leading edge DSP architectures.
 Multi-tasking environment compatible.
 Can be integrated with G.168 and G.165 echo cancellers, and tone detection/regeneration.
 Multi channel implementation.
 Complain with GSM 06.10 Recommendation.
 Optimized implementation.

GSM 06.20 Half Rate (HR) Vocoder


Vector-Sum Excited Linear Prediction (VSELP)
VOCAL Technologies, Ltd. ETSI GSM 06.20 software libraries include ANSI-C and optimized assembly
code for leading silicon suppliers (ADI, ARM, DSP Group, LSI Logic ZSP, MIPS and TI). This software is
modular and can be executed as a single task under a variety of operating systems or it can execute
standalone with its own kernel.

GSM 06.20 GSM half rate codec uses the VSELP (Vector-Sum Excited Linear Prediction) algorithm. The
VSELP algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding
algorithms known as CELP (Code Excited Linear Prediction).

GSM 06.20 GSM half rate codec's encoding process is performed on a 20 ms speech frame at a time. A
speech frame of the sampled speech waveform is read and based on the current waveform and the past
history of the waveform, the codec encoder derives 18 parameters that describe it. The parameters
extracted are grouped into the following three general classes: Energy parameters (R0 and GSP0);
Spectral parameters (LPC and INT_LPC); Excitation parameters (LAG and CODE).

5
VOCODER

GSM 06.20 half rate codec is an analysis-by-synthesis codec, therefore the speech decoder is primarily a
subset of the speech encoder. The quantised parameters are decoded and a synthetic excitation is
generated using the energy and excitation parameters. The synthetic excitation is then filtered to
provide the spectral information resulting in the generation of the synthesized speech.

GSM 06.20 speech encoder takes its input as a 13 bit uniform Pulse Code Modulated (PCM) signal either
from the audio part of the MS or on the network side, from the PSTN via an 8 bit/A-law or µ-law (PCS
1900) to 13 bit uniform PCM conversion. The encoded speech at the output of the speech encoder is
delivered to the channel coding function as defined in GSM 05.03 [3] to produce an encoded block
consisting of 228 bits leading to a gross bit rate of 11,4 kbit/s. In the RX direction, the inverse operations
take place.

GSM 06.20 describes the detailed mapping between input blocks of 160 speech samples in 13 bit
uniform PCM format into encoded blocks of 112 bits and from encoded blocks of 112 bits to output
blocks of 160 reconstructed speech samples. The sampling rate is 8 000 sample/s leading to an average
bit rate for the encoded bit stream of 5,6 kbit/s. The coding scheme is called Vector Sum Excited Linear
Prediction (VSELP) coding.

GSM 06.20 Half-rate Encoder:


1. The GSM half rate speech encoder uses an analysis by synthesis approach to determine the code
to use to represent the excitation for each subframe.
2. The codebook search procedure consists of trying each codevector as a possible excitation for
the Code Excited Linear Predictive (CELP) synthesizer.
3. The synthesized speech is compared against the input speech and a difference signal is
generated.
4. This difference signal is then filtered by a spectral weighting filter, to generate a weighted error
signal.
5. The codevector which generates the minimum weighted error power is chosen as the
codevector for that subframe.
6. The spectral weighting filter serves to weight the error spectrum based on perceptual
considerations.
7. This weighting filter is a function of the speech spectrum and can be expressed in terms of the
parameters of the short term (spectral) filter.

GSM 06.20 Half-rate Decoder:


1. The speech decoder creates the combined excitation signal from the long term filter state and
the VSELP code vector.
2. The long term filter state is replaced by another VSELP codebook and the pitch prefilter is not
used.
3. The combined excitation is then processed by an adaptive pitch prefilter and gain.
4. The prefiltered excitation is applied to the LPC synthesis filter.
5. After reconstructing the speech signal with the synthesis filter, an adaptive spectral postfilter is
applied followed by an automatic gain control which is the final processing step in the speech
decoder.

6
VOCODER

Applications:
 WIFI phones VoWLAN.
 Wireless GSM, GPRS, EGPRS, EDGE systems.
 Personal Communications.
 Wideband IP telephony .
 Audio and Video Conferencing .
 Wideband IP telephony.

Features:
 Full and half duplex modes of operation.
 Passes ETSI test vectors.
 Common compressed speech frame stream interface to support systems with multiple speech
coders (GSM FR, GSM EFR, GSM HR, G.723, G.728, G.729 et al).
 Optimized for high performance on leading edge DSP architectures.
 Multi-tasking environment compatible.
 Can be integrated with G.168 and G.165 echo cancellers, and tone detection/regeneration.
 Multi channel implementation.
 Complain with GSM 06.20 Recommendation.
 Optimized implementation.

You might also like