You are on page 1of 3

THE ADAPTIVE MULTI-RATE SPEECH CODER

E. Ekudden?, R. Hagen?, I. Johansson$, and J. Svedberg$ SEricsson Research TEricsson Research Ericsson Erisoft AB Ericsson Radio Systems AB SE-971 28 LuleH, Sweden SE-164 80 Stockholm, Sweden

ABSTRACT
In this paper, we describe the Adaptive Multi-Rate (AMR) speech coder currently under standardization for GSM systems as part of the AMR speech service. The coder is a multi-rate ACELP coder with 8 modes operating at bit-rates from 12.2 kbit/s down to 4.75 kbitls. The coder modes are integrated in a common structure where the bit-rate scalability is realized mainly by altering the quantization schemes for the different parameters. The coder provides seamless switching on 20 ms frame boundaries. The quality when used on GSM channels is significantly higher than for existing services.

equivalent to the GSM EFR [2] coder while the 7.40 kbit/s mode is equivalent to the EFR coder for the IS-136 system [3].
Table I. Bit allocation.

1.

INTRODUCTION
I

Following the successful standardisation of the GSM enhanced fullrate (GSM EFR) speech coder in 1995 (US) and 1996 (Europe) [2], ETSI conducted a feasibility study for next generation speech services. The goal was to provide high quality using the halfrate traffic channel and highly error robust operation in the fullrate traffic channel. The study concluded that the only feasible way to meet the targets was to use an Adaptive Multi-Rate concept, where the speech coder bit-rate was continuously adapted to radio channel conditions no fixed rate solution would meet all the requirements. In 1997, a competitive selection process, with a strict time-plan was started. The process included subjective qualification and selection tests during 1998. In October 1998, the AMR speech service developed in collaboration between Ericsson, Nokia and Siemens was selected.

kbit/s

AlgCB Gains

31 I 31 I 31 I 31 I ( 7 ( 7 1 7 1 7 (

124 28

2.

REQUIREMENTS

Requirements were set in terms of subjective quality, delay, and complexity. The main requirements for subjective quality at various carrier to interference ratios (Cn) can be summarized in the following way for fullrate traffic channels: The quality of GSM EFR under error-free conditions down to 13 dB Cn for clean speech and background noise. The quality of GSM EFR at 10 dB C/I should be obtained at 4 dB CA for clean speech and background noise. and for halfrate traffic channels: G.728 quality down to 16 dB C/I for clean speech, and GSM FR and G.729 quality for background noise. For C/I of 10 dB and below, the quality of GSM FR at the same Cn should be obtained.

Gains

24

4.75 kbit/s

LSF AdaptCB AlgCB

8 9

'1
I
8

4 9

I
I

4 9

1
8

4 9

23 20 36

I
~

I
31 .

Gains
~

16

3.

AMR SPEECH CODING

Linear Prediction

The Adaptive Multi-Rate speech coder is based on the Algebraic CELP (ACELP) technology [I] and is referred to as a Multi-Rate ACELP (MR-ACELP) coder. The coder is capable of operating at 8 different bit-rates denoted coder modes. The frame size is 20 ms with 4 subframes of 5 ms. A lookahead of 5 ms is used. The bit allocation for the coder is shown for each mode in Table I. The 12.2 kbit/s mode is

The LP analysis and quantization for the 12.2 kbit/s mode follows that of the GSM EFR coder, i.e. two LP filters are computed for each frame. These filters are jointly quantized with split matrix quantization of 1st order MA-prediction LSF residuals. For all other modes, one LP filter is estimated per frame as in [3]. Split V Q of 1st order MA-prediction LSF residuals are performed with 3 subvectors of dimension 3, 3, and 4. The bit allocation for the subvectors of the different

0-7803-5651 -9/99/$10.00 999 IEEE. 01

117

VQs are 27=9+9+9, 26=8+9+9, and 23=8+8+7, starting from the lowest subvector. The perceptual weighting filter is computed from the unquantized LP coefficients. Two sets of weighting factors are used, one for the 12.2 kbit/s mode and another for all other modes.

5.90 kbiUs mode The first pulse is located in either subtrack

3.2

Adaptive Codebook

The adaptive codebook (CB) search employs an open-loop pitch estimate to restrict the search range of the absolute coded lags. The open-loop pitch estimate is performed in the weighted speech domain. For the 4.75 and 5.15 kbit/s modes, this is performed once per frame, while for all other modes it is performed twice per frame. The adaptive CB lag is absolute coded for subframes 1 and 3, except for the 5.15 and 4.75 kbit/s modes where it is only done in subframe 1. The lag of the other subframes is delta coded relative to the lag of the previous subframe (which also restricts the search range accordingly). The adaptive CB operates with fractional resolution of 1/3 except for the 12.2 kbit/s mode which has 1/6 resolution. For the absolute coded lags, fractional resolution is used in the lower part of the allowed lag range. For the delta coded lags, even spacing of fractional lags over the entire search range is used in the 5-bit and 6-bit cases, while the Cbit lags have uneven spacing with fractional resolution only in the middle of the allowed range. The adaptive CB is searched closed-loop around the open-loop estimate for the absolute coded lag, while the entire delta range is searched for the delta coded lag. The integer lag values are searched first to find the best integer lag value, followed by a search of adjacent fractional lags if available.

1 or 3 from (1). The second pulse is located in subtracks 0, 1 , 2 or 4 from (1). 5.15 and 4.75 kbWs m d s The first pulse is located in one oe of the subtracks from (1). The second pulse is located in a different subtrack from the first pulse. One of two subsets of subtracks is selected every subframe by a full search. For each subframe a different configuration of subtracks is used to avoid creating audible artefacts. When searching the algebraic codebook, the signs are pre-set according to the sign of a target signal. For the modes at 6.7 kbit/s and higher, iterative, non-exhaustive search of pulse positions is performed. For the lower bit-rate modes, the pulse positions are found by exhaustive search.

3.4

Gain Quantization

All modes employ direct quantization of the adaptive codebook gain and MA-predictive quantization of the algebraic codebook gain as in [2]. The 12.2 and 7.95 kbit/s modes use scalar quantization of the adaptive and fixed codebook gain while the other modes quantize the gains jointly by vector quantization. For the 12.2 kbit/s mode, open-loop quantization of the gains is performed. The closed-loop criterion involving the target signal in the weighted speech domain is used for all other modes except 7.95 kbit/s. For the 7.95 kbit/s mode, an adaptive modified closed-loop criterion described in [4] is used. A novel structure where the gains from 2 subframes are jointly vector quantized is employed in the 4.75 kbit/s mode.

3.5

Post-processing

3.3

Algebraic Codebook

The algebraic codebook is the part of the coder where the main bit-rate variation between modes occur. The bit-rate ranges from 7.0 to 1.8 kbit/s. This is obtained by varying the number of pulses in each subframe from 10 to 2. The use of sparse algebraic codebooks is made feasible by the anti-sparseness processing described in section 3.5. Each pulse is located in a pre-defined track and each pulse have an amplitude +1. For each pulse, a sign bit and bits describing the position within the track is transmitted. To illustrate the track tables used, 5 non-overlapping subtracks are defined according to

where the sets contain the pulse positions for each subtrack. 12.2 kbiUs m d o e Two pulses are located in each of the 5 subtracks given by (1). The sign of the second pulse is not explicitly transmitted. 10.2 kbit/s mode 4 tracks of length 10 are used which are defined by

Fixed codebook gain s o t i g - For the modes 10.2, 6.70, mohn 5.90, 5.15, and 4.75 kbit/s, the fixed codebook gain is smoothed by an adaptive procedure. The smoothed gain is computed as a weighted sum of the actual decoded gain and an average value of the last 5 gains. The relative weighting is determined adaptively based on a measure of the stationarity derived from the variability in the decoded LSFs. Anti-sparseness processing For the 7.95, 6.70, 5.90, 5.15, and 4.75 kbit/s modes, an adaptive anti-sparseness postprocessing procedure is applied to the algebraic codebook vector in order to reduce the artifacts arising from the sparse algebraic codebook. The procedure consists of circular convolution with one of three pre-defined all-pass impulse responses which scrambles the high-frequency phase spectrum, as described in [5]. Adaptive post-filter - An adaptive postfilter is used for all modes as in [2]. It consists of a pitch post-filter through modification of the adaptive codebook contribution, a polezero formant post-filter, tilt compensation and automatic gain scaling. The parameter settings are the same for all modes except the highest rate mode.

3.6

Error Concealment

Two pulses are used in each track where the second pulse sign is not transmitted as for the 12.2 kbit/s mode. 795 and 7.40 kbiUs modes One pulse is located in each of subtracks 0, 1 and 2 from (I). One pulse is located in either subtrack 3 or 4 from (1). 6.70 kbit/s mode One pulse is located in subtrack 0 from (1). One pulse is located in either subtrack 1 or 3, and one pulse is located in either subtrack 2 or 4 from (1).

The error concealment algorithm is integrated in the decoder and uses a conventional state-machine structure with gradually stronger synthesis filter smoothing and decreased codebook gains when consecutive frames are erased. A novel feature of the algorithm is the source signal dependent actions, which provide stronger fixed codebook gain smoothing actions in stationary background sounds and excitation energy control bridging short mutes in the background sound created by erased frames.

118

4.

PERFORMANCE

Yos
3.50,

In the qualification phase of testing, candidate proposals were assessed individually using a common testplan. Figures 1, and 2 summarize the results with clean speech using channel conditions with constant CA in the fullrate (FR) and halfrate (HR) traffic channels. Figures 3 and 4 summarizes the results using dynamically varying channel conditions. The five different channel profiles (DEC1-DECS) were derived from measurements representing typical real world scenarios. In FR, a subset of coding modes was used which included the 12.2, 7.95 and 5.90 kbit/s modes, while the mode set in HR included the 7.95,6.70,5.90, and 5.15 kbit/s modes.
UOS
4.50
4.w

30 .0
2.50

2.00

8795Ws

15 .0
1 .00

DEC1

DEC2

DE0

DEW

DECS

Figure 4 AMR halfrate performance under dynamic Cn conditions. Under dynamically varying channel conditions, the performance in the FR channel is significantly increased compared to the GSM EFR coder under the same conditions. In the HR channel, the performance is comparable to, or exceeding, the performance of the GSM-FR coder under the same conditions. Furthermore, the coder showed good performance for a number of other conditions such as level dependency and tandeming. In the selection tests, the performance of the AMR coder candidates was assessed in an internationally coordinated set of subjective experiments, which confirmed the qualification results.

3.50

3.w
2.50

*..
1.00

\ y
...__
Ul.1

No

Cn119
dB

Enon
0-

uI46 dB

c/I=13

UlnlO

Cn-7

Cnn4

2MBovl

Figure 1 Constant Cn conditions in fullrate channels.


U S O
4.50
4.00

5.

SUMMARY

3.50

30 .0
2.50

1.00
NO

&lo

Ob76

-13

UhlO

-7

Ub4

-1

Enus

e-

dB

dB

dB

26ds4ul

Figure 2 Constant C/I conditions in halfrate channels. The results can be summarized in the following way. For clean speech, the performance in the FR channel is equivalent to GSM EFR down to 10 dE3 CA. while a significant quality increase, with 4-6 dB increased CA tolerance is achieved for poor channel conditions. In the HR channel, the performance exceeds the quality of the GSM FR coder, and provides G.728 LD-CELP quality under good radio conditions. In background noise conditions, similar trends are seen, except for the combined conditions of background noise and poor radio conditions where the performance is relatively worse.
A M

The AMR speech coder was developed to fulfill a challenging set of performance requirements for clean speech, speech in background noise, tandeming and degraded channel conditions. The coder has 8 fully integrated switchable ACELP coder modes with bit-rates from 12.2 kbit/s down to 4.75 kbit/s. The highest mode is the GSM EFR coder, which provides speech quality comparable to fixed-line quality. The lowest mode provides communication quality. The range of bit-rates and the high quality provides flexibility to trade quality and capacity as well as to optimize quality under changing channel conditions. The quality was shown to be significant19 higher than for existing speech services in GSM.

6.

REFERENCES

-.-- uos

a0 5

3.00
2.50 2.00
1.50
1.00

rl
m5.9 w s 07.95 w s 0122ws

DECI

DEC2

DE=

DEW

DECS

Figure 3 AMR fullrate performance under dynamic C/I conditions.

C. Laflamme, J-P. Adoul, R. Salami, S Morisette, and P. Mabilleau, 16 kbps wideband speech coding technique based on algebraic CELP, in Proc. Int. Con$ Acoust,. Speech, and Signal Processing, Toronto, Canada, pp. 1316, 1991. K. Jiirvinen, et al., GSM enhanced full rate speech codec, in Proc. Int. Con$ Acoust,. Speech, and Signal Processing, Munich, Germany, pp. 771-774, 1997. T. Honkanen et al., Enhanced full rate speech codec for IS-136 digital cellular system, in Proc. Int. Con$ Acoust,. Speech, and Signal Processing, Munich, Germany, pp. 73 1-734,1997. R. Hagen and E. Ekudden, An 8 kbit/s ACELP coder with improved background noise performance, in Proc. Int. Con$ Acoust,. Speech, and Signal Processing, Phoenix, AZ,1999. R. Hagen, E. Ekudden, B. Johansson, and W.B. Kleijn, Removal of sparse-excitation artifacts in CELP, in Proc. Int. Conf. Acoust,. Speech, and Signal Processing, Seattle, WA, pp. 1-145-148, 1998.

119

You might also like