This action might not be possible to undo. Are you sure you want to continue?
®
Software for the
Code Excited Linear
Prediction Algorithm
The Federal Standard–1016
Synthesis Lectures on
Algorithms and Software in
Engineering
Editor
Andreas S. Spanias, Arizona State University
MATLAB
®
Software for the Code Excited Linear Prediction Algorithm The Federal
Standard 1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias
2009
Advances in Modern Blind Signal Separation Algorithms: Theory and Applications
Kostas Kokkinakis and Philipos C. Loizou
2010
OFDMSystems for Wireless Communications
Adarsh B. Narasimhamurthy, Mahesh K. Banavar, and CihanTepedelenlio˘ glu
2010
Algorithms and Software for Predictive Coding of Speech
Atti Venkatraman
2010
Advances in WaveformAgile Sensing for Tracking
Sandeep Prasad Sira, Antonia PapandreouSuppappola, and Darryl Morrell
2008
Despeckle Filtering Algorithms and Software for Ultrasound Imaging
Christos P. Loizou and Constantinos S. Pattichis
2008
Copyright © 2009 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
MATLAB
®
Software for the Code Excited Linear Prediction Algorithm – The Federal Standard–1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias
www.morganclaypool.com
ISBN: 9781608453849 paperback
ISBN: 9781608453856 ebook
DOI 10.2200/S00252ED1V01Y201001ASE003
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN ENGINEERING
Lecture #3
Series Editor: Andreas S. Spanias, Arizona State University
Series ISSN
Synthesis Lectures on Algorithms and Software in Engineering
Print 19381727 Electronic 19381735
MATLAB
®
Software for the
Code Excited Linear
Prediction Algorithm
The Federal Standard–1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias
Arizona State University
SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN
ENGINEERING #3
C
M
&
cLaypool Morgan publishers
&
ABSTRACT
This book describes several modules of the Code Excited Linear Prediction (CELP) algorithm.
The authors use the Federal Standard1016 CELP MATLAB
®
software to describe in detail sev
eral functions and parameter computations associated with analysisbysynthesis linear prediction.
The book begins with a description of the basics of linear prediction followed by an overview of
the FS1016 CELP algorithm. Subsequent chapters describe the various modules of the CELP
algorithm in detail. In each chapter, an overall functional description of CELP modules is provided
along with detailed illustrations of their MATLAB
®
implementation. Several code examples and
plots are provided to highlight some of the key CELP concepts.
The MATLAB
®
code within this book can be found at http://www.morganclaypool.com/
page/fs1016
KEYWORDS
speech coding, linear prediction, CELP, vocoder, ITU, G.7XX standards, secure com
munications
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1
Introduction to Linear Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Linear Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Vocal Tract Parameter Estimation 3
1.1.2 Excitation, Gain and Pitch Period 5
1.1.3 Linear Prediction Parameter Transformations 6
1.1.4 LongTerm Prediction 6
1.2 AnalysisbySynthesis Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Code Excited Linear Prediction Algorithms 9
1.2.2 The Federal Standard1016 CELP 12
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2
Autocorrelation Analysis and Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Framing and Windowing the Input Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Computation of Autocorrelation Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 The LevinsonDurbin Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Bandwidth Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Inverse LevinsonDurbin Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3
Line Spectral Frequency Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Construction of LSP Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Computing the Zeros of the Symmetric Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Computing the Zeros of the AntiSymmetric Polynomial . . . . . . . . . . . . . . . . . . . . . 35
3.4 Testing IllConditioned Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Quantizing the Line Spectral Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
viii CONTENTS
3.6 Adjusting Quantization to Preserve Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4
Spectral Distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Conversion of LSP to DirectForm Coefﬁcients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Computation of Autocorrelation Lags from Reﬂection Coefﬁcients . . . . . . . . . . . . . 53
4.3 Calculation of Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5
The Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1 Overview of Codebook Search Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Adaptive Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Target Signal for Adaptive Codebook Search 66
5.2.2 Integer Delay Search 68
5.2.3 SubMultiple/Fractional Delay Search 70
5.3 Stochastic Codebook Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.1 Target Signal for Stochastic Codebook Search 72
5.3.2 Search for Best Codevector 75
5.4 Bitstream Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6
The FS1016 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.1 Extracting Parameters for Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 LP Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Postﬁltering Output Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Computing Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Biblography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Preface
The CELP algorithm, proposed in the mid 1980’s for analysisbysynthesis linear predictive
coding, had a signiﬁcant impact in speech coding applications. The CELP algorithm still forms
the core of many speech coding standards that exist nowadays. Federal Standard1016 is an early
standardized version of the CELP algorithm and is based on the enhanced version of the originally
proposed AT&T Bell Labs CELP coder [1].
This book presents MATLAB
®
software that simulates the FS1016 CELP algorithm. We
describe the theory and implementation of the algorithm along with several detailed illustrations
using MATLAB
®
functions. In the ﬁrst chapter, we introduce the theory behind speech analysis
and linear prediction and also provide an overview of speech coding standards. From Chapter 2
and on, we describe the various modules of the FS1016 CELP algorithm. The theory sections in
most chapters are selfsufﬁcient in that they explain the necessary concepts required to understand
the algorithm. The implementation details and illustrations with MATLAB
®
functions complement
the theoretical underpinnings of CELP. Furthermore, the MATLAB
®
programs and the supporting
ﬁles are available for download in the companion website of the book. The extensive list of references
allows the reader to carry out a more detailed study on speciﬁc aspects of the algorithm and they
cover some of the recent advancements in speech coding.
We note that although this is an existing algorithm and perhaps the very ﬁrst that has been
standardized, it is a great starting point for learning the core concepts associated with CELP. Stu
dents will ﬁnd it easy to understand the CELP functions by using the software and the associated
simulations. Practitioners and algorithmdevelopers will be able to understand the basic concepts and
develop several new functions to improve the algorithm or to adapt it for certain new applications.
We hope this book will be useful in understanding the CELP algorithm as well as the more recent
speech coding standards that are based on CELP principles.
The authors acknowledge Ted Painter, who wrote the MATLAB
®
code for the FS1016
CELP algorithm, and DoD for developing the C code for the algorithm.
The MATLAB
®
code within this book can be found at http://www.morganclaypool.
com/page/fs1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias
February 2010
1
C H A P T E R 1
Introduction to Linear
Predictive Coding
This book introduces linear predictive coding and describes several modules of the Code Excited
Linear Prediction (CELP) algorithm in detail. The MATLAB program for Federal Standard1016
(FS1016) CELP algorithm is used to illustrate the components of the algorithm. Theoretical
explanation and mathematical details along with relevant examples that reinforce the concepts are
also provided in the chapters. In this chapter, some of the basics of linear prediction starting from
the principles of speech analysis are introduced along with some fundamental concepts of speech
analysis.
Linear prediction analysis of speech signals is at the core of most narrowband speech coding
standards. A simple engineering synthesis model that has been used in several speech processing
and coding applications is shown in Figure 1.1. This sourcesystem model is inspired by the human
speech production mechanism. Voiced speech is produced by exciting the vocal tract ﬁlter with
quasiperiodic glottal pulses. The periodicity of voiced speech is due to the vibrating vocal chords.
Unvoiced speech is produced by forcing air through a constriction in the vocal tract or by using other
mechanisms that do not engage the vocal chords.
Synthetic
Speech
Gain Vocal Tract Filter
Synthetic
Speech
Gain Vocal Tract Filter
Figure 1.1: Engineering model for speech synthesis.
The vocal tract is usually represented by a tenthorder digital allpole ﬁlter. As shown in
Figure 1.1, voiced speech is produced by exciting the vocal tract ﬁlter with periodic impulses and
unvoiced speech is generated using random pseudowhite noise excitation. Filter coefﬁcients and
excitation parameters are typically determined every 20 ms or less for speech sampled at 8 kHz. The
2 1. INTRODUCTIONTOLINEARPREDICTIVECODING
ﬁlter frequency response models the formant structure of speech and captures the resonant modes
of the vocal tract.
1.1 LINEARPREDICTIVECODING
Digital ﬁlters for linear predictive coding applications are characterized by the difference equation,
y(n) = b(0)x(n) −
M
i=1
a(i)y(n − i) . (1.1)
In the inputoutput difference equation above, the output y(n) is given as the sumof the input minus
a linear combination of past outputs (feedback term). The parameters a(i) and b(i) are the ﬁlter
coefﬁcients or ﬁlter taps and they control the frequency response characteristics of the ﬁlter. Filter
coefﬁcients are programmable and can be made adaptive (timevarying). A directform realization
of the digital ﬁlter is shown in Figure 1.2.
y(n)
+
x(n)
b(0)
a(1)
Σ
a(2)
a(m)
1
z
− 1
z
−
1
z
−

 
y(n)
+
x(n)
b(0)
a(1)
Σ
a(2)
a(m)
1
z
− 1
z
−
1
z
−

 
Figure 1.2: Linear prediction synthesis ﬁlter.
When the tenthorder allpole ﬁlter representing the vocal tract is excited by white noise, the
signal model corresponds to anautoregressive (AR) timeseries representation.The coefﬁcients of the
ARmodel can be determined using linear prediction techniques. The application of linear prediction
in speech processing, and speciﬁcally in speech coding, is often referred to as Linear Predictive
Coding (LPC). The LPC parameterization is a central component of many compression algorithms
that are used in cellular telephony for bandwidth compression and enhanced privacy. Bandwidth is
conserved by reducing the data rate required to represent the speech signal. This data rate reduction
is achieved by parameterizing speech in terms of the AR or allpole ﬁlter coefﬁcients and a small
set of excitation parameters. The two excitation parameters for the synthesis conﬁguration shown
1.1. LINEARPREDICTIVECODING 3
in Figure 1.1 are the following: (a) the voicing decision (voiced/unvoiced) and (b) the pitch period.
In the simplest case, 160 samples (20 ms for 8 kHz sampling) of speech can be represented with ten
allpole vocal tract parameters and two parameters that specify the excitation signal. Therefore, in
this case, 160 speech samples can be represented by only twelve parameters which results in a data
compression ratio of more than 13 to 1 in terms of the number of parameters that will be encoded
and transmitted. In new standardized algorithms, more elaborate forms of parameterization exploit
further the redundancy in the signal and yield better compression and much improved speech quality.
1.1.1 VOCALTRACTPARAMETERESTIMATION
Linear prediction is used to estimate the vocal tract parameters. As the name implies, the linear pre
dictor estimates the current sample of the speechsignal using a linear combinationof the past samples.
For example, in a tenthorder linear predictor, an estimate of the current speech sample s(n) is pro
duced using a linear combinationof the tenprevious samples, i.e., s(n − 1), s(n − 2), . . . , s(n − 10).
This is done by forming a prediction error,
e(n) = s(n) −
10
i=1
a(i) s(n − i) . (1.2)
The prediction parameters a(i) are unknown and are determined by minimizing the MeanSquare
Error (MSE) E[e
2
(n)]. The prediction parameters a(i) are also used to form the allpole digital
ﬁlter for speech synthesis. The minimization of the MSE yields a set of autocorrelation equations
that can be represented in terms of the matrix equation,
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
r (1)
r (2)
r (3)
.
.
r (10)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
r (0) r (1) r (2) . . . r (9)
r (1) r (0) r (1) . . . r (8)
r (2) r (1) r (0) . . . r (7)
. . . . . . .
. . . . . . .
r (9) r (8) r (7) . . . r (0)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
a(1)
a(2)
a(3)
.
.
a(10)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
. (1.3)
The autocorrelation sequence can be estimated using the equation,
r (m) =
1
N
N−m−1
n=0
s (n + m) s (n) . (1.4)
The integer N is the number of speech samples in the frame (typically N = 160). The autocorre
lations are computed once per speech frame and the linear prediction coefﬁcients are computed by
4 1. INTRODUCTIONTOLINEARPREDICTIVECODING
inverting the autocorrelation matrix, i.e.,
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
a(1)
a(2)
a(3)
.
.
a(10)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
=
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
r (0) r (1) r (2) . . . r (9)
r (1) r (0) r (1) . . . r (8)
r (2) r (1) r (0) . . . r (7)
. . . . . . .
. . . . . . .
r (9) r (8) r (7) . . . r (0)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
−1
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
r (1)
r (2)
r (3)
.
.
r (10)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
.
(1.5)
The coefﬁcients a(1), a(2), . . . , a(10) form the transfer function in (1.6) of the ﬁlter which is used
in the speech synthesis system shown in Figure 1.1. This allpole ﬁlter reproduces speech segments
using either random (unvoiced) or periodic (voiced) excitation.
H(z) =
1
1 −
10
i=1
a(i)z
−i
. (1.6)
The matrix in (1.3) can be inverted efﬁciently using the LevinsonDurbin orderrecursive
algorithm given in (1.7 ad).
for m = 0, initialize:
ε
f
0
= r (0)
for m = 1 to 10:
a
m
(m) =
r (m) −
m−1
i=1
a
m−1
(i) r (m − i)
ε
f
m−1
for i = 1 to m − 1:
a
m
(i) = a
m−1
(i) − a
m
(m) a
m−1
(m − i) , 1 ≤ i ≤ m − 1
end
ε
f
m
=
_
1 − (a
m
(m))
2
_
ε
f
m−1
End
(1.7 ad)
The subscript m of the prediction parameter a
m
(i) in the recursive expression (1.7 ad) is
the order of prediction during the recursion while the integer in the parenthesis represents the
coefﬁcient index. The symbol ε
f
m
represents the meansquare estimate of the prediction residual
during the recursion. The coefﬁcients a
m
(m) are called reﬂection coefﬁcients and are represented
by the symbol k
m
= a
m
(m). In the speech processing literature, the negated reﬂection coefﬁcients
are also known as Partial Correlation (PARCOR) coefﬁcients. Reﬂection coefﬁcients correspond to
lattice ﬁlter structures that have been shown to be electrical equivalents of acoustical models of the
vocal tract. Reﬂection coefﬁcients have good quantization properties and have been used in several
1.1. LINEARPREDICTIVECODING 5
speech compression systems for encoding vocal tract parameters. A lattice structure is shown in
Figure 1.3.
s(n)
k
1
k
1
k
2
k
2
k
m
k
m
1
( )
f
e n
1
( )
b
e n
2
( )
f
e n
2
( )
b
e n
( )
f
m
e n
( )
b
m
e n
1
z
− 1
z
−
1
z
−
s(n)
k
1
k
1
k
2
k
2
k
m
k
m
1
( )
f
e n
1
( )
b
e n
2
( )
f
e n
2
( )
b
e n
( )
f
m
e n
( )
b
m
e n
1
z
− 1
z
−
1
z
−
Σ Σ Σ
Σ Σ Σ
Figure 1.3: Lattice structure for linear prediction and reﬂection coefﬁcients.
1.1.2 EXCITATION, GAINANDPITCHPERIOD
The residual signal e(n) given in (1.2) forms the optimal excitation for the linear predictor. For
lowrate representations of speech, the residual signal is typically replaced by a parametric excitation
signal model. Many of the early algorithms for linear prediction used the twostate excitation model
(impulses/noise) shown in Figure 1.1. This model is parameterized in terms of the gain, the binary
voicing parameter and the pitch period. The gain of voiced and unvoiced segments is generally
determined such that the shortterm energy of the synthetic speech segment matches that of the
analysis segment. For unvoiced speech, the excitation is produced by a random number generator.
Since unvoiced segments are associated with small energy and large number of zero crossings, voicing
can be determined by energy and zerocrossing measurements. Pitch estimation and tracking is a
difﬁcult problem. Some of the wellknown algorithms for pitch detection were developed in the
late sixties and seventies. A straightforward approach to pitch detection is based on selecting the
peak of the autocorrelation sequence (excluding r(0)). A more expensive but also more robust
pitch detector relies on peakpicking the cepstrum. The Simpliﬁed Inverse Filter Tracking (SIFT)
algorithmis based on peakpicking the autocorrelation sequence of the prediction residual associated
with downsampled speech. Postprocessing algorithms for pitch smoothing are also used to provide
frametoframe pitch continuity. Current pitch detection algorithms yield highresolution (sub
sample) estimates for the pitch period and are often speciﬁc to the analysissynthesis system. In
many AnalysisbySynthesis (AbyS) linear predictive coders, the pitch is measured by a closed
loop process which accounts for the impact of the pitch on the overall quality of the reconstructed
speech.
6 1. INTRODUCTIONTOLINEARPREDICTIVECODING
1.1.3 LINEARPREDICTIONPARAMETERTRANSFORMATIONS
One of the major issues in LPCis the quantization of the linear prediction parameters. Quantization
of the directformcoefﬁcients a(i) is generally avoided.The reﬂectioncoefﬁcients k
m
are byproducts
of the Levinson algorithm and are more robust for quantization. The reﬂection coefﬁcients can also
be quantized in an ordered manner, i.e., the ﬁrst few reﬂection coefﬁcients can be encoded with a
higher precision. Transformations of the reﬂection coefﬁcients, such as the Log Area Ratio (LAR),
i.e.,
LAR(m) = log
_
1 + k
m
1 − k
m
_
, (1.8)
have also been used in several LPC algorithms. Another representation of Linear Prediction (LP)
parameters that is being used in several standardized algorithms consists of Line Spectrum Pairs
(LSPs). In LSP representations, a typical tenthorder polynomial,
A(z) = 1 + a
1
z
−1
+ · · · + a
10
z
−10
, (1.9)
is represented by the two auxiliary polynomials P (z) and Q(z) where,
P (z) = A(z) + z
−11
A
_
z
−1
_
, (1.10)
Q(z) = A(z) − z
−11
A
_
z
−1
_
. (1.11)
P(z) and Q(z) have a set of ﬁve complex conjugate pairs of zeros each that typically lie on the unit
circle. Hence, each polynomial can be represented by the ﬁve frequencies of their zeros (the other
ﬁve frequencies are their negatives). These frequencies are called Line Spectral Frequencies (LSFs).
If the polynomial A(z) is minimum phase, the roots of the polynomials P(z) and Q(z) alternate on
the unit circle.
1.1.4 LONGTERMPREDICTION
Almost all modern LPalgorithms include longtermprediction in addition to the tenthorder short
term linear predictor. Long Term Prediction (LTP) captures the longterm correlation in the speech
signal and provides a mechanism for representing the periodicity of speech. As such, it represents
the ﬁne harmonic structure in the shortterm speech spectrum. The LTP requires estimation of two
parameters, i.e., a delay τ and a gain parameter a(τ). For strongly voiced segments, the delay is usually
an integer that approximates the pitch period. The transfer function of a simple LTP synthesis ﬁlter
is given by,
A
τ
(z) =
1
1 − a(τ)z
−τ
. (1.12)
The gain is obtained by the equation a(τ) = r(τ)/r(0). Estimates of the LTP parameters can be
obtained by searching the autocorrelation sequence or by using closedloop searches where the LTP
lag that produces the best speech waveform matching is chosen.
1.2. ANALYSISBYSYNTHESIS LINEARPREDICTION 7
1.2 ANALYSISBYSYNTHESIS LINEARPREDICTION
In closedloop sourcesystem coders, shown in Figure 1.4, the excitation source is determined
by closedloop or AbyS optimization. The optimization process determines an excitation se
quence that minimizes the perceptuallyweighted MSEbetween the input speech and reconstructed
speech [1, 2, 3]. The closedloop LP combines the spectral modeling properties of vocoders with
the waveform matching attributes of waveform coders; and hence, the AbyS LP coders are also
called hybrid LP coders. The term hybrid is used because of the fact that AbyS integrates vocoder
and waveform coder principles. The system consists of a shortterm LP synthesis ﬁlter 1/A(z) and a
LTP synthesis ﬁlter 1/A
L
(z) as shown in Figure 1.4. The Perceptual Weighting Filter (PWF) W(z)
shapes the error such that quantization noise is masked by the highenergy formants. The PWF is
given by,
W(z) =
A(z/γ
1
)
A(z/γ
2
)
=
1 −
m
i=1
γ
1
(i) a(i) z
−i
1 −
m
i=1
γ
2
(i) a(i) z
−i
; 0 < γ
2
< γ
1
< 1 , (1.13)
where γ
1
and γ
2
are the adaptive weights, and m is the order of the linear predictor. Typically, γ
1
ranges from 0.94 to 0.98; and γ
2
varies between 0.4 and 0.7 depending upon the tilt or the ﬂatness
characteristics associated with the LPCspectral envelope [4, 5]. The role of W(z) is to deemphasize
the error energy in the formant regions [6]. This deemphasis strategy is based on the fact that, in
the formant regions, quantization noise is partially masked by speech. From Figure 1.4, note that a
gain factor g scales the excitation vector x and the excitation samples are ﬁltered by the longterm
and shortterm synthesis ﬁlters.
The three most common excitation models typically embedded in the excitation generator
module (Figure 1.4) in the AbyS LP schemes include: the MultiPulse Excitation (MPE) [2, 3],
the Regular Pulse Excitation (RPE) [7], and the vector or Code Excited Linear Prediction (CELP) [1].
A 9.6 kb/s MultiPulse Excited Linear Prediction (MPELP) algorithm is used in Skyphone airline
applications [8]. A 13 kb/s coding scheme that uses RPE [7] was adopted for the fullrate ETSI
GSMPanEuropean digital cellular standard [9]. The standard was eventually replaced by the GSM
Enhanced FullRate (EFR) described brieﬂy later.
The aforementioned MPELP and RPE schemes achieve highquality speech at medium
rates. For lowrate highquality speech coding a more efﬁcient representation of the excitation
sequence is required. Atal [10] suggested that highquality speech at low rates may be produced by
using noninstantaneous (delayed decision) coding of Gaussian excitation sequences in conjunction
with AbyS linear prediction and perceptual weighting. In the mideighties, Atal and Schroeder [1,
11] proposed a CELP algorithm for AbyS linear predictive coding.
8 1. INTRODUCTIONTOLINEARPREDICTIVECODING
Σ
s
e
x
ˆ s
+
_
Excitation
generator
(MPE or RPE)
g
1/A(z) 1/A
L
(z)
Input
speech
Synthetic
speech
Residual
error
Error minimization
W(z)
Σ
s
e
x
ˆ s
+
_
Excitation
generator
(MPE or RPE)
g
1/A(z) 1/A
L
(z)
Input
speech
Synthetic
speech
Residual
error
Error minimization
W(z)
Figure 1.4: A typical sourcesystem model employed in the analysisbysynthesis LP.
The excitation codebook search process in CELP can be explained by considering the AbyS
scheme shown in Figure 1.5. The N × 1 error vector e associated with the i
th
excitation vector, can
be written as,
e
(i)
= s
w
− ˆ s
0
w
− g
k
ˆ s
(i)
w
, (1.14)
where s
w
is the N × 1 vector that contains the perceptuallyweighted speech samples, ˆ s
0
w
is the
vector that contains the output due to the initial ﬁlter state, ˆ s
(i)
w
is the ﬁltered synthetic speech vector
associated with the i
th
excitation vector, and g
(i)
is the gain factor. Minimizing ζ
(i)
= e
(i)
T
e
(i)
with
respect to g
(i)
, we obtain,
g
(i)
=
¯ s
T
w
ˆ s
(i)
w
s
(i)
T
w
ˆ s
(i)
w
, (1.15)
where ¯ s
w
= s
w
− s
0
w
, and T represents the transpose of a vector. From (1.15), ζ
(i)
can be written as,
ζ
(i)
= ¯ s
T
w
¯ s
w
−
_
¯ s
T
w
ˆ s
(i)
w
_
2
ˆ s
(i)
T
w
ˆ s
(i)
w
. (1.16)
The i
th
excitation vector x
(i)
that minimizes (1.16) is selected and the corresponding gain factor
g
(i)
is obtained from (1.15). Note that the perceptual weighting, W(z), is applied directly on the
input speech, s, and synthetic speech, ˆ s, in order to facilitate for the CELP analysis that follows.
The codebook index, i, and the gain, g
(i)
, associated with the candidate excitation vector, x
(i)
, are
encoded and transmitted along with the shortterm and longterm prediction ﬁlter parameters.
1.2. ANALYSISBYSYNTHESIS LINEARPREDICTION 9
MSE
minimization
PWF
W(z)
Input
speech
LP synthesis
filter
1/A(z)
+
_
Synthetic
speech
Residual
error
Excitation vectors
Codebook
( ) i
x
( )
ˆ
i
s
( )
ˆ
i
w
s
( ) i
e
LTP synthesis
filter
1/A
L
(z)
g
(i)
PWF
W(z)
s
s
w
…..
…..
…..
Σ
MSE
minimization
PWF
W(z)
Input
speech
LP synthesis
filter
1/A(z)
+
_
Synthetic
speech
Residual
error
Excitation vectors
Codebook
( ) i
x
( )
ˆ
i
s
( )
ˆ
i
w
s
( ) i
e
LTP synthesis
filter
1/A
L
(z)
g
(i)
PWF
W(z)
s
s
w
…..
…..
…..
Σ
Figure 1.5: A generic block diagram for the AbyS Code Excited Linear Predictive (CELP) coding.
One of the disadvantages of the original CELPalgorithmis the large computational complex
ity required for the codebook search [1]. This problem motivated a great deal of work focused upon
developing structured codebooks [12, 13] and fast search procedures [14]. In particular, Davidson
and Gersho [12] proposed sparse codebooks and Kleijn et al. [13] proposed a fast algorithm for
searching stochastic codebooks with overlapping vectors. In addition, Gerson and Jasiuk [15, 16]
proposed a Vector Sum Excited Linear Predictive (VSELP) coder which is associated with fast
codebook search and robustness to channel errors. Other implementation issues associated with
CELP include the quantization of the CELP parameters, the effects of channel errors on CELP
coders, and the operation of the algorithm on ﬁniteprecision and ﬁxedpoint machines. A study on
the effects of parameter quantization on the performance of CELP was presented in [17], and the
issues associated with the channel coding of the CELP parameters were discussed by Kleijn in [18].
Some of the problems associated with the ﬁxedpoint implementation of CELP algorithms were
presented in [19].
1.2.1 CODEEXCITEDLINEARPREDICTIONALGORITHMS
In this section, we taxonomize CELP algorithms into three categories that are consistent with
the chronology of their development, i.e., ﬁrstgeneration CELP (19861992), secondgeneration
CELP (19931998), and thirdgeneration CELP (1999present).
10 1. INTRODUCTIONTOLINEARPREDICTIVECODING
1.2.1.1 FirstGeneration CELP Coders
The ﬁrstgeneration CELP algorithms operate at bit rates between 4.8 kb/s and 16 kb/s. These
are generally high complexity and nontoll quality algorithms. Some of the ﬁrstgeneration CELP
algorithms include the following: the FS1016 CELP, the IS54 VSELP, the ITUT G.728 Low
Delay (LD) CELP, and the IS96 QualcommCELP. The FS1016 4.8 kb/s CELPstandard [20, 21]
was jointly developed by the Department of Defense (DoD) and the Bell Labs for possible use in
the thirdgeneration Secure Telephone Unit (STUIII). The IS54 VSELP algorithm [15, 22] and
its variants are embedded in three digital cellular standards, i.e., the 8 kb/s TIA IS54 [22], the
6.3 kb/s Japanese standard [23], and the 5.6 kb/s halfrate GSM [24]. The VSELP algorithm uses
highly structured codebooks that are tailored for reduced computational complexity and increased
robustness to channel errors. The ITUT G.728 LowDelay (LD) CELP coder [25, 26] achieves
low oneway delay by using very short frames, a backwardadaptive predictor, and short excitation
vectors (5 samples). The IS96 Qualcomm CELP (QCELP) [27] is a variable bit rate algorithm
and is part of the Code Division Multiple Access (CDMA) standard for cellular communications.
Most of these standardized CELP algorithms were eventually replaced by newer second and third
generation AbyS coders.
1.2.1.2 SecondGeneration NearTollQuality CELP Coders
The secondgeneration CELP algorithms are targeted for Internet audio streaming, Voiceover
InternetProtocol (VoIP), teleconferencing applications, and secure communications. Some of the
secondgeneration CELP algorithms include the following: the ITUT G.723.1 dualrate speech
codec [28], the GSM EFR [23, 29], the IS127 Relaxed CELP (RCELP) [30, 31], and the ITUT
G.729 Conjugate Structured  Algebraic CELP (CSACELP) [4, 32].
The coding gain improvements in secondgeneration CELP coders can be attributed partly
to the use of algebraic codebooks in excitation coding [4, 32, 33, 34]. The term Algebraic CELP
(ACELP) refers to the structure of the codebooks used to select the excitation codevector. Various
algebraic codebook structures have been proposed [33, 35], but the most popular is the interleaved
pulse permutation code. In this codebook, the codevector consists of a set of interleaved permutation
codes containing only few nonzero elements. This is given by,
p
i
= i + jd, j = 0, 1, . . . , 2
M
− 1 , (1.17)
where p
i
is the pulse position, i is the pulse number, and d is the interleaving depth. The integer M
is the number of bits describing the pulse positions. Table 1.1 shows an example ACELP codebook
structure, where, the interleaving depth, d = 5, the number of pulses or tracks are equal to 5, and
the number of bits to represent the pulse positions, M = 3. From (1.17), p
i
= i + j5, where i =
0, 1, 2, 3, 4 and j = 0, 1, 2, . . . , 7.
1.2. ANALYSISBYSYNTHESIS LINEARPREDICTION 11
Table 1.1: An example algebraic codebook struc
ture; tracks and pulse positions.
Track(i) Pulse positions (P
i
)
0 P
0
: 0, 5, 10, 15, 20, 25, 30, 35
1 P
1
: 1, 6, 11, 16, 21, 26, 31, 36
2 P
2
: 2, 7, 12, 17, 22, 27, 32, 37
3 P
3
: 3, 8, 13, 18, 23, 28, 33, 38
4 P
4
: 4, 9, 14, 19, 24, 29, 34, 39
For a given value of i, the set deﬁned by (1.17) is known as ‘track,’ and the value of j deﬁnes
the pulse position. From the codebook structure shown in Table 1.1, the codevector, x(n), is given
by,
x(n) =
4
i=0
α
i
δ(n − p
i
) , n = 0, 1, . . . , 39 , (1.18)
where δ(n) is the unit impulse, α
i
are the pulse amplitudes (±1), and p
i
are the pulse positions. In
particular, the codebook vector, x(n), is computed by placing the 5 unit pulses at the determined
locations, p
i
, multiplied with their signs (±1). The pulse position indices and the signs are encoded
and transmitted. Note that the algebraic codebooks do not require any storage.
1.2.1.3 ThirdGeneration (3G) CELP for 3GCellular Standards
The 3G CELP algorithms are multimodal and accommodate several different bit rates. This is
consistent with the vision on wideband wireless standards [36] that will operate in different modes:
lowmobility, highmobility, indoor, etc. There are at least two algorithms that have been developed
and standardized for these applications. In Europe, GSM standardized the Adaptive MultiRate
(AMR) coder [37, 38], and in the U.S., the Telecommunications Industry Association (TIA) has
tested the Selectable Mode Vocoder (SMV) [39, 40, 41]. In particular, the adaptive GSM multirate
coder [37, 38] has been adopted by European Telecommunications Standards Institute (ETSI) for
GSM telephony. This is an ACELP algorithm that operates at multiple rates: 12.2, 10.2, 7.95, 6.7,
5.9, 5.15, and 5.75 kb/s. The bit rate is adjusted according to the trafﬁc conditions.
The Adaptive MultiRate WideBand (AMRWB) [5] is an ITUT wideband standard that
has been jointly standardized with 3GPP, and it operates at rates 23.85, 23.05, 19.85, 18.25, 15.85,
14.25, 12.65, 8.85 and 6.6 kbps. The higher rates provide better quality where background noise
is stronger. The other two important speech coding standards are the Variable Rate Multimode
WideBand (VMRWB) and the extended AMRWB (AMRWB+). The VMRWB has been
adopted as the new 3GPP2 standard for wideband speech telephony, streaming and multimedia
messaging systems [42]. VMRWB is a Variable Bit Rate (VBR) coder with the source controlling
the bit rate of operation: FullRate (FR), HalfRate (HR), QuarterRate (QR) or EighthRate
12 1. INTRODUCTIONTOLINEARPREDICTIVECODING
(ER) encoding to provide the best subjective quality at a particular Average Bit Rate (ABR). The
sourcecoding bit rates are 13.3, 6.2, 2.7 and 1 kbps for FR, HR, QR and ER encoding schemes.
The AMRWB+ [43] can address mixed speech and audio content and can consistently deliver high
quality audio even at low bit rates. It has been selected as an audio coding standard in 2004 by ETSI
and 3rd Generation Partnership Project (3GPP). The SMV algorithm (IS893) was developed to
provide higher quality, ﬂexibility, and capacity over the existing IS96 QCELPand IS127 Enhanced
Variable Rate Coding (EVRC) CDMA algorithms. The SMV is based on four codecs: fullrate at
8.5 kb/s, halfrate at 4 kb/s, quarterrate at 2 kb/s, and eighthrate at 0.8 kb/s. The rate and mode
selections inSMVare based onthe frame voicing characteristics and the network conditions. G.729.1
is a recent speech codec adopted by ITUT that can handle both narrowband and wideband speech
and is compatible with the widely used G.729 codec [44]. The codec can operate from 8 kb/s to
32 kb/s with 8 kb/s and 12 kb/s in the narrowband and 14 kb/s to 32 kb/s at 2 kb/s intervals in
the wideband. The codec incorporates bandwidth extension and provides bandwidth and bit rate
scalability at the same time.
1.2.2 THEFEDERAL STANDARD1016 CELP
A 4.8 kb/s CELP algorithm has been adopted in the late 1980s by the DoD for use in the STUIII.
This algorithm is described in the Federal Standard1016 (FS1016) [21] and was jointly developed
by the DoD and AT&T Bell labs. Although new algorithms for use with the STU emerged, such
as the Mixed Excitation Linear Predictor (MELP), the CELP FS1016 remains interesting for our
study as it contains core elements of AbyS algorithms that are still very useful. The candidate
algorithms and the selection process for the standard are described in [45]. The synthesis conﬁg
uration for the FS1016 CELP is shown in Figure 1.6. Speech in the FS1016 CELP is sampled
at 8 kHz and segmented in frames of 30 ms duration. Each frame is segmented into subframes
of 7.5 ms duration. The excitation in this CELP is formed by combining vectors from an adaptive
and a stochastic codebook with gains g
a
and g
s
, respectively (gainshape vector quantization). The
excitation vectors are selected in every subframe by minimizing the perceptuallyweighted error
measure. The codebooks are searched sequentially starting with the adaptive codebook. The term
“adaptive codebook” is used because the LTPlag search can be viewed as an adaptive codebook search
where the codebook is deﬁned by previous excitation sequences (LTP state) and the lag τ determines
the speciﬁc vector. The adaptive codebook contains the history of past excitation signals and the
LTP lag search is carried over 128 integer (20 to 147) and 128 noninteger delays. A subset of lags
is searched in even subframes to reduce the computational complexity. The stochastic codebook
contains 512 sparse and overlapping codevectors [18]. Each codevector consists of sixty samples and
each sample is ternary valued (1, 0, −1) [46] to allow for fast convolution.
Ten shortterm prediction parameters are encoded as LSPs on a framebyframe basis. Sub
frame LSPs are obtained by applying linear interpolation of frame LSPs. A shortterm polezero
postﬁlter (similar to that proposed in [47]) is also part of the standard. The details on the bit
allocations are given in the standard. The computational complexity of the FS1016 CELP was
1.3. SUMMARY 13
Speech
VQ index
Lag index
Postfilter
Stochastic
codebook
Adaptive
codebook
+
+
Σ
g
a
g
s
A(z)
Σ
Speech
VQ index
Lag index
Postfilter
Stochastic
codebook
Adaptive
codebook
+
+
Σ
g
a
g
s
A(z)
Σ
Figure 1.6: FS1016 CELP synthesis.
estimated at 16 Million Instructions per Second (MIPS) for partially searched codebooks and the
Diagnostic Rhyme Test (DRT) and Mean Opinion Scores (MOS) were reported to be 91.5 and 3.2,
respectively.
1.3 SUMMARY
This chapter introduced the basics of linear predictive coding, AbyS linear prediction and provided
a review of the speech coding algorithms based on CELP. A detailed review of the basics of speech
coding, algorithms and standards until the early 1990’s can be found in [48]. A more recent and
short review of the modern speech coding algorithms is available in the book by Spanias et al. [49].
FromChapter 2 and on, we will describe the details of the FS1016 standard and provide MATLAB
programfor all the pertinent functions. Chapter 2 will describe the autocorrelation analysis and linear
prediction module starting from the framing of speech signal upto the computation of reﬂection
coefﬁcients from the bandwidth expanded LP coefﬁcients. The construction of LSP polynomials,
computation of LSFs and their quantization are described in Chapter 3. Various distance measures
that are used to compute spectral distortion are detailed in Chapter 4. Chapter 5 illustrates the
adaptive and stochastic codebook search procedures, and the ﬁnal chapter describes the components
in the FS1016 decoder module.
15
C H A P T E R 2
Autocorrelation Analysis and
Linear Prediction
Linear Prediction (LP) analysis is performed on the framed and windowed input speech to compute
the directform linear prediction coefﬁcients. It involves performing autocorrelation analysis of
speech and using the LevinsonDurbin algorithm to compute the linear prediction coefﬁcients.
The autocorrelation coefﬁcients are calculated and high frequency correction is done to prevent ill
conditioning of the autocorrelation matrix. The autocorrelation lags are converted to LP coefﬁcients
using LevinsonDurbin recursion. Then, the directformLP coefﬁcients are subjected to bandwidth
expansion. The bandwidth expanded LP coefﬁcients are converted back to Reﬂection Coefﬁcients
(RCs) using the inverse LevinsonDurbin recursion. The block diagram in Figure 2.1 illustrates
the autocorrelation analysis and linear prediction module of the FS1016 CELP transmitter. Every
chapter starting from this one, will have a similar schematic block diagram that will illustrate the
overall function of the module described in that chapter. Each block in the diagram corresponds to
a particular script in the FS1016 MATLAB code, whose name is provided in the block itself.
Figure 2.1: Autocorrelation analysis and linear prediction.
16 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
The following sections of this chapter demonstrate the computation of autocorrelation lags,
conversion of the autocorrelation lags to LP coefﬁcients and recursive conversion of the LP coefﬁ
cients to RCs using MATLAB programs.
2.1 FRAMINGANDWINDOWINGTHEINPUTSPEECH
The speech frame used in the FS1016 transmitter is of size 240 samples and corresponds to 30 ms
of speech at the standardized sampling rate of 8000 Hz. There are three speech buffers used in the
encoder. Two speech buffers, s
old
and s
new
are one frame behind and one frame ahead, respectively,
while the subframe analysis buffer, s
sub
, is half frame behind s
new
and half frame ahead of s
old
. In
the later part of the FS1016 analysis stage, s
sub
will be divided into four subframes each of size
60 samples corresponding to 7.5 ms of speech. The input speech buffer, s
new
, is scaled and clamped to
16 bit integer range, before processing. Clamping to 16 bit integer range is performed by thresholding
the entries of the speech buffer. It is then ﬁltered using a highpass ﬁlter to remove low frequency
noise and windowed with a Hamming window for high frequency correction. Program P2.1 loads
an input speech frame and creates the three speech frame buffers in the variables snew, sold and
ssub.
Example 2.1 The Hamming window used to window the speech signal is shown in Figure 2.2.
The input speech (s
new
) and the windowed speech (s) frames are shown in Figures 2.3 and 2.4,
Figure 2.2: The 240 point hamming window.
respectively, and they are generated using Program P2.1. The plot of the Fourier transforms for the
input and the windowed speech frames are provided in Figures 2.5 and 2.6. The Fourier transform
of the windowed speech frame is smoother because of the convolution of the frequency response of
the Hamming window in the frequency domain.
2.1. FRAMINGANDWINDOWINGTHEINPUTSPEECH 17
% P2.1  frame_window.m
function [s,snew] = frame_window(iarf,w,sold)
% sold  Initialized to zeros for the first frame
% iarf  Input speech frame, 240 samples
% scale  Scaling parameter, initialized to 1
% maxval Maximum possible value, 32767
% minval Minimum possible value, 32768
% ll  Length of buffer, 240 samples
% w  Hamming Window Sequence
load framew_par.mat;
load hpf_param.mat;
ssub=zeros(240,1);
% SCALE AND CLAMP INPUT DATA TO 16BIT INTEGER RANGE
snew = min( [ (iarf .* scale)'; maxval ] )';
snew = max( [ snew'; minval ] )';
% RUN HIGHPASS FILTER TO ELIMINATE HUM AND LF NOISE
[ snew, dhpf1 ] = filter( bhpf, ahpf, snew, dhpf1 );
% MAINTAIN SSUB SUBFRAME ANALYSIS BUFFER. IT IS
% 1/2 FRAME BEHIND SNEW AND 1/2 FRAME AHEAD OF SOLD.
ssub( 1:ll/2 ) = sold( (ll/2)+1:ll );
ssub( (ll/2)+1:ll ) = snew( 1:ll/2 );
% UPDATE SOLD WITH CONTENTS OF SNEW
sold = snew;
% APPLY WINDOW TO INPUT SEQUENCE
s = snew.* w;
Program P2.1: Framing and windowing.
18 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
Figure 2.3: Input speech frame.
Figure 2.4: Windowed speech frame.
2.2 COMPUTATIONOF AUTOCORRELATIONLAGS
The biased autocorrelation estimates are calculated using the modiﬁed time average formula [50],
r(i) =
N
n=1+i
u(n)u(n − i), i = 0, 1, . . . , M , (2.1)
2.2. COMPUTATIONOF AUTOCORRELATIONLAGS 19
Figure 2.5: Fourier transform of input speech frame.
Figure 2.6: Fourier transform of windowed speech frame.
where, r is the autocorrelation value that is computed, i is the lag, u is the input signal and N is the
length of the input signal. Program P2.2 computes the autocorrelation lags of a test signal u for the
maximum lag speciﬁed by M.
Example 2.2 A sinusoidal test signal (u) shown in Figure 2.7 is used to estimate autocorrelation
using Program P2.2, and the autocorrelation estimates are computed for lags 0 to 20 as shown in
20 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
% P2.2  cor.m
function [r0 r] = cor(u,M)
% u  Input signal.
% N  Length of input signal.
% M  Maximum lag value.
N=length(u);
% ALLOCATE RETURN VECTOR
r = zeros( M, 1 );
% COMPUTE r(0)
r0 = sum( u(1:N) .* u(1:N) );
% COMPUTE C(i), i NONZERO
for i = 1:M
r(i) = sum( u(i+1:N) .* u(1:Ni) );
end
Program P2.2: Computation of autocorrelation lags.
Figure 2.7: Sinusoidal test signal.
Figure 2.8. The sinusoidal test signal has a period of 16 samples, and it can be readily observed from
the autocorrelation function, which has a peak at a lag of 16.
2.3 THELEVINSONDURBINRECURSION
Linear predictor coefﬁcients are the coefﬁcients of the LP polynomial A(z). Recall that LP analysis
is carried out using the allzero ﬁlter A(z) whose input is the preprocessed speech frame and the
output is the LP residual. But, during synthesis the ﬁlter used has the transfer function 1/A(z),
which models the vocal tract and reproduces the speech signal when excited by the LP residual.
2.3. THELEVINSONDURBINRECURSION 21
Figure 2.8: The autocorrelation estimate for the sinusoidal test signal.
The LevinsonDurbin recursion is used to compute the coefﬁcients of the LP analysis ﬁlter from
the autocorrelation estimates [51]. It exploits the Toeplitz structure of the autocorrelation matrix to
recursively compute the coefﬁcients of a tenthorder ﬁlter.
The algorithm [52] initializes the MSE as follows,
ε
0
= r(0) , (2.2)
where, ε
0
is the initial MSE and r(0) is the autocorrelation. Then, for a ﬁrstorder predictor,
a
1
(1) =
−r(1)
r(0)
, (2.3)
k
1
= a
1
(1) , (2.4)
q = r(1) , (2.5)
where, a
m
(i) is the i
th
predictor coefﬁcient of order m and k
m
is the reﬂection coefﬁcient.
We ﬁnd the linear predictor coefﬁcients recursively using (2.6) to (2.9). Note that, in the
MATLAB code given in Program P2.3, the index of linear predictor coefﬁcients start from 1,
whereas in the equations the index starts from 0. For i = 1 to m−1,
22 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
% P2.3  durbin.m
function a = durbin( r0, r, m )
% r0  Zero lag autocorrelation(value)
% r  Autocorrelation for lags 1 to m+1
% m  Predictor Order
% a  LP coefficients (initialized as column vector, size m+1, a(1)=1)
% INITIALIZATION
a=zeros(m+1,1);
a(1)=1;
E = r0;
a(2) = r(1) / r0;
k(1) = a(2);
q = r(1);
% RECURSION
for i = 1:m1
E= E + ( q * k(i) );
q = r( i+1 );
q = q + sum( r( 1:i ) .* a( i+1:1:2 ) );
k( i+1 ) = q / E;
tmp( 1:i ) = k( i+1 ) .* a( i+1:1:2 );
a( 2:i+1 ) = a( 2:i+1 ) + tmp( 1:i )';
a( i+2 ) = k( i+1 );
end
Program P2.3: LevinsonDurbin recursion.
ε
i
= ε
i−1
+ k
i
q , (2.6)
q = r(i + 1) +
i
j=1
a
i
(i + 1 − j)r(i) , (2.7)
k
i+1
=
−q
ε
i
, (2.8)
ε
i
= ε
i−1
(1 − k
2
i
) , (2.9)
a
i+1
(j) = a
i
(j) + k
i+1
a
i
(i + 1 − j) . (2.10)
Example 2.3 The frequency response magnitude of the LP synthesis ﬁlter 1/A(z), where the
coefﬁcients of A(z) are calculatedusing the ProgramP2.3, is showninFigure 2.9.The autocorrelation
estimates (r) of the speech signal (s) shown in Figure 2.4 are used to compute the coefﬁcients of
A(z).
2.4. BANDWIDTHEXPANSION 23
Figure 2.9: Frequency response of the linear prediction synthesis ﬁlter.
2.4 BANDWIDTHEXPANSION
Bandwidth expansion by a factor of γ changes the linear predictor coefﬁcients from a(i) to γ
i
a(i)
and moves the poles of the LP synthesis ﬁlter inwards. The bandwidth expansion can be computed
by [53],
BW = 2 cos
−1
_
1 −
(1 − γ )
2
2γ
_
, (2.11)
where 3 − 2
√
2 ≤ γ ≤ 1, and BW is the bandwidth expansion in radians. At a sampling frequency of
8000 Hz, a bandwidth expansion factor of 0.994127 produces a bandwidth expansion of 15 Hz. This
bandwidth expansion factor is used in the implementation of the FS1016 algorithm. Bandwidth
expansion expands the bandwidth of the formants and makes the transfer function somewhat less
sensitive to roundoff errors.
% P2.4  bwexp.m
function aexp = bwexp( gamma, a, m )
% gamma  Bandwidth expansion factor, 0.99412
% a  Linear predictor coefficients, column vector
% aexp  Bandwidth expanded linear predictor coefficients
% m  Order of the linear predictor
% SCALE PREDICTOR COEFFICIENTS TO SHIFT POLES RADIALLY INWARD
aexp( 1:m+1, 1 ) = ( a( 1:m+1 ) .* ( gamma .^ (0:m)' ) );
Program P2.4: Bandwidth expansion.
24 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
Example 2.4 The polezero plots of the original and bandwidthexpanded ﬁlter coefﬁcients for
a tenthorder linear predictor are shown in Figure 2.10. The corresponding frequency response is
shown in Figure 2.11. The linear prediction ﬁlter coefﬁcients are calculated for the speech signal
given in Figure 2.4. The ﬁlter coefﬁcients, before and after bandwidth expansion by a factor of
0.994127 are given in Table 2.1. Though not quite evident in this case, after bandwidth expansion,
the poles have moved inwards in the polezero plot.
Figure 2.10: Polezero plots of linear prediction ﬁlter.
Figure 2.11: Frequency response before and after bandwidth expansion.
2.4. BANDWIDTHEXPANSION 25
Table 2.1: Linear Prediction Filter coefﬁcients.
Coefﬁcient Before Bandwidth
Expansion
After Bandwidth
Expansion
a
0
1.0000 1.0000
a
1
1.6015 1.5921
a
2
1.5888 1.5702
a
3
1.3100 1.2870
a
4
1.4591 1.4251
a
5
1.3485 1.3094
a
6
1.0937 1.0558
a
7
0.7761 0.7447
a
8
0.9838 0.9385
a
9
0.7789 0.7386
a
10
0.2588 0.2440
In order to demonstrate bandwidth expansion more clearly, let us take an example of the Long
Term Predictor (LTP), where the ﬁlter expression is given by 1/(1 − 0.9z
−10
). The bandwidth
expansion factor γ is taken to be 0.9. Please note that this is only an instructive example and
bandwidth expansion is generally not applied to the LTP. The polezero plot is given in Figure 2.12
and the frequency response is given in Figure 2.14. In the polezero plot, it is quite evident that
the poles have moved towards the origin and the frequency response also clearly shows expanded
bandwidth. The plots given in Figure 2.13 showthat the impulse response after bandwidth expansion
decays faster than the one before bandwidth expansion.
Figure 2.12: PoleZero plots of linear prediction ﬁlter.
26 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
Figure 2.13: Impulse response before and after bandwidth expansion.
Figure 2.14: Frequency response before and after bandwidth expansion.
2.5 INVERSELEVINSONDURBINRECURSION
Inverse LevinsonDurbin recursion is used to compute the RCs from the bandwidth expanded
linear predictor coefﬁcients [50]. Here, the coefﬁcients are realvalued, therefore the orderupdate
equations for the forward and backward prediction ﬁlters are given in a matrix format by,
_
a
i
(j)
a
i
(i − j)
_
=
_
1 k
i
k
i
1
_ _
a
i−1
(j)
a
i−1
(i − j)
_
, j = 0, 1, . . . , i , (2.12)
2.5. INVERSELEVINSONDURBINRECURSION 27
where the order i varies from 1 to m. Solving for a
i−1
(j) from the above equation, we get,
a
i−1
(j) =
a
i
(j) − k
i
a
i
(i − j)
1 − k
i

2
, j = 0, 1, . . . , i , (2.13)
where, k
i
is the reﬂection coefﬁcient of order i. Starting from the highest order m, we compute the
linear predictor coefﬁcients for orders decreasing from m − 1 through 1 using (2.13). Then, using
the fact that k
i
= a
i
(i), we determine the RCs of orders 1 through m − 1. If the absolute values of all
the RCs are less than 1, then the LP polynomial is minimum phase resulting in a stable LP synthesis
ﬁlter. This can be intuitively understood from (2.9), where if k
i
 < 1, ε
i
< ε
i−1
which means that
the MSE decreases with the order of recursion as we expect for a stable ﬁlter. If k
i
 = 1 or k
i
 > 1
the MSE is either zero or it increases with the order of recursion, which is not a characteristic of a
stable ﬁlter. Detailed explanation of the relationship between the stability of the LP synthesis ﬁlter
and the magnitude of reﬂection coefﬁcients is given in [54].
In the MATLAB code given in Program P2.5, there are certain variable and indexing mod
iﬁcations that are speciﬁc to our implementation. First, we take k
i
= −a
i
(i) because of the sign
convention that forces k
1
to be positive for voiced speech. Because of this, when (2.12) is imple
mented in the program, the numerator is a summation instead of difference. The indices for the
array start from 1, and vectorized operations are used to compute the coefﬁcient array in a single
statement.
% P2.5  pctorc.m
function k = pctorc( lpc, m )
% MAXNO  Predictor Order (10 here)
% lpc  LP Coefficients
% m  Order of LP polynomial
MAXNO=m;
% ALLOCATE RETURN VECTOR AND INIT LOCAL COPY OF PREDICTOR POLYNOMIAL
k = zeros( MAXNO, 1 );
a = lpc;
% DO INVERSE LEVINSON DURBIN RECURSION
for i = m:1:2
k(i) = a( i+1 );
t( i:1:2 ) = ( a( i:1:2 ) + ( k(i) .* a(2:i) ) ) / ...
( 1.0  ( k(i) * k(i) ) );
a( 2:i ) = t( 2:i );
end
k(1) = a(2);
Program P2.5: Inverse LevinsonDurbin recursion.
Example 2.5 In order to keep the examples simple, we work with a reduced order LP polynomial.
This is achieved by taking only the ﬁrst four autocorrelation lags (r0_4, r_4) of the speech frame
28 2. AUTOCORRELATIONANALYSIS ANDLINEARPREDICTION
(s) and forming a fourth order LP polynomial using LevinsonDurbin recursion. The coefﬁcients
of the fourth order LP polynomial are bandwidthexpanded (a_4exp) and provided as input to the
ProgramP2.5 to yield the reﬂection RCs (k_4). The coefﬁcients are given as vectors below. Note that
the vectors are given in MATLAB notation where separation using semicolon indicates a column
vector. In this case, the LP synthesis ﬁlter is stable because the absolute values of all the reﬂection
coefﬁcients are less than 1.
a_4exp = [1.0000; −1.5419; 1.3523; −0.7197; 0.3206] ,
k_4 = [0.7555; −0.7012; 0.2511; −0.3206] .
2.6 SUMMARY
LPanalysis is performed on the speech frames and the directformLPcoefﬁcients are obtained using
the autocorrelation analysis and linear prediction module. Further information on linear prediction
is available in a tutorial on linear prediction by Makhoul [55] and the book by Markel and Gray [56].
The government standard FS1015 algorithm is primarily based on openloop linear prediction and
is a precursor of the AbyS FS1016 [57]. The LP coefﬁcients are bandwidthexpanded and then
converted to RCs. The negative of the RCs (−k
i
) are also known as PARCOR coefﬁcients [58], and
they can be determined using the harmonic analysis algorithm proposed by Burg [59]. Note that
the stability of the LP synthesis ﬁlter can be veriﬁed by checking the magnitude of RCs. RCs are
less sensitive to roundoff noise and quantization errors than the directform LP coefﬁcients [60].
Because of the good quantization properties, RCs have been used in several of the ﬁrst generation
speech coding algorithms. Second and third generation coders, however, use Line Spectrum Pairs
(LSPs) that are described in the next chapter.
29
C H A P T E R 3
Line Spectral Frequency
Computation
Line Spectrum Pairs (LSPs) are an alternative LP spectral representation of speech frames that
have been found to be perceptually meaningful in coding systems. LSPs can be quantized using
perceptual criteria and have good interpolation properties. Two LSP polynomials can be formed
from the LP polynomial A(z). When A(z) is minimum phase, the zeros of the LSP polynomial
have two interesting properties: (1) they lie on the unit circle and (2) the zeros of the two polynomials
are interlaced [61]. Each zero corresponds to a LSP frequency and instead of quantizing the LP
coefﬁcients, the corresponding LSPfrequencies are quantized. After quantizing the LSPfrequencies,
if the properties of the LSP polynomials are preserved, the reconstructed LPC ﬁlter retains the
minimumphase property [62].The LSPfrequencies are also calledLine Spectral Frequencies (LSFs).
The LSFs are quantized using an independent, nonuniform scalar quantization procedure. Scalar
quantizationmay result innonmonotonicity of the LSFs and inthat case, adjustments are performed
in order to restore their monotonicity. The block diagram in Figure 3.1 shows the steps involved in
the generation the LSP polynomials, computing the LSFs and quantizing them.
Construction of LSP polynomials and computation of their zeros [62] by applying Descartes’
rule will be illustrated using MATLAB programs in the following sections. Correction of ill
conditioned cases that occur due to nonminimum phase LP polynomials and quantization of LSFs
are also illustrated.
3.1 CONSTRUCTIONOF LSP POLYNOMIALS
The LSP polynomials are computed from the linear predictor polynomial A(z) as follows,
P(z) = A(z) + z
−(m+1)
A(z
−1
) , (3.1)
Q(z) = A(z) − z
−(m+1)
A(z
−1
) , (3.2)
where P(z) is called the symmetric polynomial (palindromic) and Q(z) is called the antisymmetric
polynomial (antipalindromic). This is because of the reason that P(z) has symmetric coefﬁcients
and Q(z) has antisymmetric coefﬁcients. Evidently, P(z) has a root at (−1, 0) in the zplane as
30 3. LINESPECTRAL FREQUENCY COMPUTATION
Figure 3.1: Computation of line spectral frequencies.
it has symmetric coefﬁcients and Q(z) has a root at (0, 0) in the zplane as it has antisymmetric
coefﬁcients. To regenerate A(z) from P(z) and Q(z), we use,
A(z) = 0.5[P(z) + Q(z)] . (3.3)
In practice, only half the coefﬁcients of the LSP polynomials, excluding the ﬁrst element are
stored. The reason is that the polynomials have symmetry and their ﬁrst coefﬁcient is equal to 1. In
Program P3.1, in addition to storing a part of the coefﬁcients as described above, all the coefﬁcients
of both P(z) and Q(z) are stored in the output variables pf and qf for illustration purposes.
Example 3.1 The linear predictor coefﬁcients for the speech signal given in Figure 2.4 are shown
in Figure 3.2. The resulting LSP polynomial coefﬁcients are shown in Figure 3.3. Figure 3.3 shows
that the coefﬁcients of P(z) are symmetric and those of Q(z) are antisymmetric.
As another example, a fourthorder LP polynomial is considered. Using Program P3.1, the
coefﬁcients of P(z) and Q(z) are generated. The coefﬁcients of LP polynomial (a_4exp), the
coefﬁcients of the symmetric LSP polynomial (pf_4) and the coefﬁcients of the antisymmetric
LSP polynomial (qf_4), are listed below.
a_4exp = [1.0000; −1.5419; 1.3523; −0.7197; 0.3206] ,
pf_4 = [1.0000; −1.2212; 0.6326; 0.6326; −1.2212; 1.0000] ,
qf_4 = [1.0000; −1.8625; 2.0719; −2.0719; 1.8625; −1.0000] .
3.2. COMPUTINGTHEZEROS OFTHESYMMETRICPOLYNOMIAL 31
% P3.1  generate_lsp.m
function [p,q,pf,qf]=generate_lsp(a,m)
% a  LPC coefficients
% m  LPC predictor order(here 10)
% p,q  First half of coefficients of P(z)& Q(z),excluding 1
% pf,qf  All the coefficients of P(z)& Q(z)
% MAXORD Maximum order of P or Q(here 24)
MAXORD = 24;
MAXNO = m;
% INIT LOCAL VARIABLES
p = zeros( MAXORD, 1 );
q = zeros( MAXORD, 1 );
mp = m + 1;
mh = fix ( m/2 );
% GENERATE P AND Q POLYNOMIALS
p( 1:mh ) = a( 2:mh+1 ) + a( m+1:1:mmh+2 );
q( 1:mh ) = a( 2:mh+1 )  a( m+1:1:mmh+2 );
p=[1;p(1:mh);flipud(p(1:mh));1];
q=[1;q(1:mh);flipud(q(1:mh));1];
% ALL THE COEFFICIENTS
pf=[1;p(1:mh);flipud(p(1:mh));1];
qf=[1;q(1:mh);flipud(q(1:mh));1];
Program P3.1: LSP polynomial construction.
3.2 COMPUTINGTHEZEROS OFTHESYMMETRIC
POLYNOMIAL
It is not possible to ﬁnd the roots of the symmetric polynomial in closed form, but there are some
properties of the polynomial that can be exploited to ﬁnd its roots numerically in a tractable manner.
The symmetric polynomial is given by,
P(z) = 1 + p
1
z
−1
+ p
2
z
−2
+ · · · + p
2
z
−(m−1)
+ p
1
z
−m
+ z
−(m+1)
, (3.4)
where p
i
is the i
th
coefﬁcient of P(z). Assuming that the symmetric polynomial has roots lying on
the unit circle, we evaluate the zeros of P(z) by solving for ω in,
P(z)
z=e
jω = 2e
−j
m+1
2
ω
_
cos
(m + 1)ω
2
+ p
1
cos
(m − 1)ω
2
+ · · · + p
(m+1)/2
_
. (3.5)
32 3. LINESPECTRAL FREQUENCY COMPUTATION
Figure 3.2: Linear predictor coefﬁcients.
Figure 3.3: Symmetric and antisymmetric polynomial coefﬁcients.
Let us call the bracketed term of (3.5) as P
(ω). Since the zeros occur in complex conjugate pairs we
compute only the zeros on the upper half of the unit circle. Also, the zero at the point (−1, 0) in the
unit circle is not computed. The MATLAB code to compute the zeros of the symmetric polynomial
is given in Program P3.2.
In order to compute the roots of the symmetric polynomial numerically using Descartes’ rule,
the following procedure is adopted. The upper half of the unit circle is divided into 128 intervals
and each interval is examined for zeros of the polynomial. The fundamental assumption is that
3.2. COMPUTINGTHEZEROS OFTHESYMMETRICPOLYNOMIAL 33
% P3.2  sym_zero.m
function [freq,sym_freq]=sym_zero(p,m)
% m  Order of LPC polynomial
% freq All LSP frequencies(to be computed)
% sym_freq  LSP frequencies of P(z)
% N  Number of divisions of the upper half unit circle
% DEFINE CONSTANTS
EPS = 1.00e6; % Tolerance of polynomial values for finding zero
N = 128;
NB = 15;
% INITIALIZE LOCAL VARIABLES
freq = zeros( m+1, 1 );
mp = m + 1;
mh = fix ( m/2 );
% COMPUTE P AT F=0
fl = 0.0;
pxl = 1.0 + sum( p(1:mh) );
% SEARCH FOR ZEROS OF P
nf = 1;
i = 1;
while i <= N
mb = 0;
% HIGHER FREQUENCY WHERE P IS EVALUATED
fr = i * ( 0.5 / N );
pxr = cos( mp * pi * fr );
jc = mp  ( 2 * ( 1:mh )' );
% ARGUMENT FOR COS
ang = pi * fr * jc;
% EVALUATION OF P AT HIGHER FREQUENCY
pxr = pxr + sum( cos(ang) .* p(1:mh) );
tpxr = pxr;
tfr = fr;
% COMPARE THE SIGNS OF POLYNOMIAL EVALUATIONS
% IF THEY ARE OPPOSITE SIGNS, ZERO IN INTERVAL
if ( pxl * pxr ) <= 0.00
mb = mb + 1;
% BISECTION OF THE FREQUENCY INTERVAL
fm = fl + ( fr  fl ) / ( pxl  pxr ) * pxl;
pxm = cos( mp * pi * fm );
jc = mp  ( (1:mh)' * 2 );
ang = pi * fm * jc;
pxm = pxm + sum( cos(ang) .* p(1:mh) );
Program P3.2: Computation of zeros for symmetric LSP polynomial. (Continues.)
34 3. LINESPECTRAL FREQUENCY COMPUTATION
% CHECK FOR SIGNS AGAIN AND CHANGE THE INTERVAL
if ( pxm * pxl ) > 0.00
pxl = pxm;
fl = fm;
else
pxr = pxm;
fr = fm;
end
% PROGRESSIVELY MAKE THE INTERVALS SMALLER AND REPEAT THE
% PROCESS UNTIL ZERO IS FOUND
while (abs(pxm) > EPS) & ( mb < 4 )
mb = mb + 1;
fm = fl + ( fr  fl ) / ( pxl  pxr ) * pxl;
pxm = cos( mp * pi * fm );
jc = mp  ( (1:mh)' * 2 );
ang = pi * fm * jc;
pxm = pxm + sum( cos(ang) .* p(1:mh) );
if ( pxm * pxl ) > 0.00
pxl = pxm;
fl = fm;
else
pxr = pxm;
fr = fm;
end
% IF THE PROCESS FAILS USE DEFAULT LSPS
if ( (pxlpxr) * pxl ) == 0
freq( 1:m ) = ( 1:m ) * 0.04545;
fprintf( 'pctolsp2: default lsps used, avoiding /0\n');
return
end
% FIND THE ZERO BY BISECTING THE INTERVAL
freq(nf) = fl + (frfl) / (pxlpxr) * pxl;
nf = nf + 2;
if nf > m1
break
end
end
pxl = tpxr;
fl = tfr;
i = i + 1;
end
% LSP Frequencies of P(z) alone
sym_freq=freq(1:2:m);
Program P3.2: (Continued.) Computation of zeros for symmetric LSP polynomial.
3.3. COMPUTINGTHEZEROS OFTHEANTISYMMETRICPOLYNOMIAL 35
the interval is so small that it contains at most a single zero. The polynomial P
(ω) is evaluated
at the frequencies fr and fl which are the end frequencies of the interval considered. The values
of the polynomial P
(ω) in the two frequencies are stored in the program variables pxr and pxl,
respectively.
Initially the value of P
(ω) at frequency 0 is the sumof its coefﬁcients. Then P
(ω) is evaluated
at the radian frequency ω, where ω is represented as πi/128. In general, the program variable jc
contains the argument of the cosine function excluding 0.5ω, and the program variable ang is the
complete argument for the cosine function.
Upon computation of pxr and pxl, their signs are compared. If there is a sign change, then
by Descartes’ rule, there are an odd number of zeros. However, as our interval is small enough there
can be no more than a single zero. If there is no sign change, there are no zeros and therefore the
program proceeds to the next interval. If there is a zero, the next step is the computation of mid
frequency fm using the bisection method. The value of pxm is subsequently calculated by using fm.
Then again, the signs of pxm and pxl are compared. If there is a sign change, then there is a zero
between fm and fl. Therefore, the interval is changed accordingly. This is repeated until the value
of pxm gets close to zero and then the frequency interval is bisected to ﬁnd the zero. After ﬁnding
a zero, we move on to the next interval to ﬁnd the next zero. This process is repeated until all the
zeros on the upper half of the unit circle are determined. Every time a zero is calculated, the number
of frequencies computed (nf) is incremented by two because each zero on the upper half of the unit
circle is a complex conjugate of another in the lower half of the unit circle. Hence, ﬁnding one zero
is actually equivalent to ﬁnding two LSFs.
Example 3.2 The roots of the symmetric polynomial whose coefﬁcients are shown in Figure 3.3
are given in Figure 3.4. As it can be seen, the roots lie on the unit circle and appear as complex
conjugate pairs.
3.3 COMPUTINGTHEZEROS OFTHEANTISYMMETRIC
POLYNOMIAL
Finding the root of the antisymmetric polynomial is similar to ﬁnding the roots of the symmetric
polynomial. However, it is simpler because of the property that the roots of the symmetric and
antisymmetric polynomial are interlaced. The antisymmetric polynomial is given by,
Q(z) = 1 + q
1
z
−1
+ q
2
z
−2
. . . − q
2
z
−(m−1)
− q
1
z
−m
− z
−(m+1)
, (3.6)
where q
i
’s are the coefﬁcients of Q(z). Since we need to evaluate the zeros of Q(z) only on the unit
circle, we can write Q(z) as,
Q(z)
z=e
jω = 2je
−j
m+1
2
ω
_
sin
(m + 1)ω
2
+ q
1
sin
(m − 1)ω
2
+ . . . + q
(m+1)/2
_
, (3.7)
36 3. LINESPECTRAL FREQUENCY COMPUTATION
Figure 3.4: Roots of the symmetric LSP polynomial.
where we call the bracketed term as Q
(ω). The zero at the point (0, 0) in the unit circle is not
computed. The procedure to compute the roots of Q
(ω) and hence that of Q(z) is given in Pro
gram P3.3.
The procedure for computing the zeros of Q(z) is very similar to the procedure for computing
the zeros of P(z), except for a few differences. Firstly, since we know that the zeros of P(z) and
Q(z) are interlaced, the start and end points of the interval are taken as the consecutive zeros of
symmetric LSP polynomial. Secondly, instead of using the cosine function, we use sine function
because of the difference between (3.5) and (3.7).
Example 3.3 The roots of the antisymmetric polynomial whose coefﬁcients are shown in Fig
ure 3.3 are given in Figure 3.5. Again, all the roots lie on the unit circle and appear as complex
conjugate pairs.
Another example is to consider a fourthorder LP polynomial and compute the LSP fre
quencies. Considering the same LP polynomial in Example 3.1, we get the symmetric LSFs
(sym_freq_4) from Program P3.2 and antisymmetric LSFs (asym_freq_4) from Program P3.3
as,
sym_freq_4 = [0.0842; 0.2102] ,
asym_freq_4 = [0.1082; 0.3822] ,
freq_4 = [0.0682; 0.1082; 0.2033; 0.3822] ,
3.3. COMPUTINGTHEZEROS OFTHEANTISYMMETRICPOLYNOMIAL 37
% P3.3  antisym_zero.m
function [freq,asym_freq]=antisym_zero(q,m,freq)
% m  Order of LPC polynomial
% freq LSP frequencies(to be computed)
% N  Number of divisions of the upper half unit circle
% EPS  Tolerance of polynomial values for finding zeros
% DEFINE CONSTANTS
EPS = 1.00e6;
N = 128;
NB = 15;
% INITIALIZE LOCAL VARIABLES
mp = m + 1;
mh = fix ( m/2 );
% THE LAST FREQUENCY IS 0.5
freq(m+1) = 0.5;
% FIRST SYMMETRIC LSP FREQUENCY IS THE STARTING POINT
fl = freq(1);
qxl = sin( pi * mp * fl );
jc = mp  ( (1:mh)' * 2 );
% ARGUMENT FOR SIN
ang = pi * fl * jc;
qxl = qxl + sum( sin(ang) .* q(1:mh) );
i = 2;
while i < mp
mb = 0;
% USE ONLY SYMMETRIC LSP FREQUENCIES AS END POINTS(ZEROS INTERLACED)
fr = freq(i+1);
qxr = sin( mp * pi * fr );
jc = mp  ( (1:mh)' * 2 );
ang = pi * fr * jc;
qxr = qxr + sum( sin(ang) .* q(1:mh) );
tqxl = qxl;
tfr = fr;
tqxr = qxr;
mb = mb + 1;
fm = ( fl + fr ) * 0.5;
qxm = sin( mp * pi * fm );
jc = mp  ( (1:mh)' * 2 );
ang = pi * fm * jc;
Program P3.3: Computation of zeros for antisymmetric LSP polynomial. (Continues.)
38 3. LINESPECTRAL FREQUENCY COMPUTATION
qxm = qxm + sum( sin(ang) .* q(1:mh) );
% CHECK FOR SIGN CHANGE
if ( qxm * qxl ) > 0.00
qxl = qxm;
fl = fm;
else
qxr = qxm;
fr = fm;
end
% Progressively make the intervals smaller and repeat the
% process until zero is found
while ( abs(qxm) > EPS*tqxl ) & ( mb < NB )
mb = mb + 1;
fm = ( fl + fr ) * 0.5;
qxm = sin( mp * pi * fm );
jc = mp  ( (1:mh)' * 2 );
ang = pi * fm * jc;
qxm = qxm + sum( sin(ang) .* q(1:mh) );
if ( qxm * qxl ) > 0.00
qxl = qxm;
fl = fm;
else
qxr = qxm;
fr = fm;
end
end
% If the process fails use previous LSPs
if ( ( qxl  qxr ) * qxl ) == 0
freq( 1:m ) = ( 1:m ) * 0.04545;
fprintf( 'pctolsp2: default lsps used, avoiding /0\n');
return
end
% Find the zero by bisecting the interval
freq(i) = fl + ( fr  fl ) / ( qxl  qxr ) * qxl;
qxl = tqxr;
fl = tfr;
i = i + 2;
end
% LSP frequencies excluding the last one(0.5)
freq=freq(1:m);
% LSP Frequencies of Q(z) alone
asym freq=freq(2:2:m);
Program P3.3: (Continued.) Computation of zeros for antisymmetric LSP polynomial.
3.3. COMPUTINGTHEZEROS OFTHEANTISYMMETRICPOLYNOMIAL 39
Figure 3.5: Roots of the antisymmetric LSP polynomial.
where the array freq_4 stores all the LSFs. Note that the LSFs are normalized with respect to the
sampling frequency, and hence they range between 0 and 0.5. The relation between an LSF and the
corresponding zero of the LSP polynomial is given as,
ˆ z = cos(2πf ) + i sin(2πf ) , (3.8)
where f is the LSF and ˆ z is the corresponding zero of the LSP polynomial. The LSFs 0 and 0.5 are
not included as it is evident that the zero of P(z) at the point (0, 0) in the unit circle corresponds to
a normalized frequency of 0 and the zero of Q(z) at the point (−1, 0) in the unit circle corresponds
to a normalized frequency of 0.5. The LSFs for the LP polynomial given in Example 3.1 are given
in Figure 3.6, where stars indicate the LSFs from P(z) and circles indicate the LSFs from Q(z).
So far, we have examined LP polynomials that are minimum phase to generate LSP fre
quencies. We study here the effect of LP polynomials with nonminimum phase by considering the
following coefﬁcients of A(z),
a_nm = [1.0000; −1.5898; 0.4187; 0.3947; 0.4470] .
The polezero plot of 1/A(z) is given in Figure 3.7, where we can see the poles lying outside
the unit circle making A(z) a nonminimum phase polynomial. Figure 3.8 shows the LSFs of
the symmetric LSP polynomial (shown as stars) and the LSFs of antisymmetric LSP polynomial
(shown as circles). Note that the frequencies are not obtained using Programs P3.2 and P3.3, but
rather they are calculated directly using MATLAB. We can see that all the roots of Q(z) do not lie
on the unit circle and hence do not satisfy the assumption that all the roots of the LSP polynomials
lie on the unit circle. Therefore, the LSFs obtained from the Program P3.2 and P3.3 will not be
40 3. LINESPECTRAL FREQUENCY COMPUTATION
Figure 3.6: Roots of the LSP polynomials.
accurate because of the assumption of minimum phase LP polynomial, A(z). The attempt to detect
and correct this case of illcondition is described in the next section.
Figure 3.7: Polezero plot of the nonminimum phase ﬁlter.
3.4 TESTINGILLCONDITIONEDCASES
Illconditioned cases in ﬁnding the roots of the LSP polynomials occur if the LP polynomial used to
generate them is nonminimum phase. In a wellconditioned case, the LP polynomial is minimum
phase and our assumption that the zeros of LSP polynomials lie on the unit circle is valid. If the
zeros of the symmetric and antisymmetric LSP polynomials do not alternate on the unit circle, the
3.5. QUANTIZINGTHELINESPECTRAL FREQUENCIES 41
Figure 3.8: Roots of the LSP polynomials for the nonminimum phase ﬁlter.
LP polynomial may not be minimum phase. Therefore, the Programs P3.2 and P3.3 do not output
monotonic LSFs.
Program P3.4 tests illconditioned cases that occur with LSPs and tries to correct them. If
more than one LSF is 0 or 0.5, then the program variable lspflag is set to 1, indicating that a case
of illcondition is encountered. If the LSFs are nonmonotonic, an attempt to correct them is made.
If that attempt fails, the LSFs of the previous frame are used.
Example 3.4 Consider the nonminimum phase LP polynomial in Example 3.3, and using Pro
grams P3.1, P3.2, and P3.3, we obtain the LSFs as,
freq = [0.1139; 0.0064; 0.1987; 0.3619] .
We can see that the LSFs are nonmonotonic, and using Program P3.4, we can detect and
correct the illconditioned case. After using Program P3.4, the LSFs reordered for monotonicity
are obtained as,
freq = [0.0064; 0.1139; 0.1987; 0.3619] .
3.5 QUANTIZINGTHELINESPECTRAL FREQUENCIES
The LSFs will be quantized using the quantization matrix given inTable 3.1. Six frequencies (LSF 1
and LSFs 5 to 9) will be quantized using 3 bits and the remaining four (LSFs 2 to 5) will be quantized
42 3. LINESPECTRAL FREQUENCY COMPUTATION
T
a
b
l
e
3
.
1
:
L
S
F
q
u
a
n
t
i
z
a
t
i
o
n
m
a
t
r
i
x
.
Q
u
a
n
t
i
z
a
t
i
o
n
L
e
v
e
l
s
F r e q u e n c y I n d e x
1
0
0
1
7
0
2
2
5
2
5
0
2
8
0
3
4
0
4
2
0
5
0
0
0
0
0
0
0
0
0
0
2
1
0
2
3
5
2
6
5
2
9
5
3
2
5
3
6
0
4
0
0
4
4
0
4
8
0
5
2
0
5
6
0
6
1
0
6
7
0
7
4
0
8
1
0
8
8
0
4
2
0
4
6
0
5
0
0
5
4
0
5
8
5
6
4
0
7
0
5
7
7
5
8
5
0
9
5
0
1
0
5
0
1
1
5
0
1
2
5
0
1
3
5
0
1
4
5
0
1
5
5
0
6
2
0
6
6
0
7
2
0
7
9
5
8
8
0
9
7
0
1
0
8
0
1
1
7
0
1
2
7
0
1
3
7
0
1
4
7
0
1
5
7
0
1
6
7
0
1
7
7
0
1
8
7
0
1
9
7
0
1
0
0
0
1
0
5
0
1
1
3
0
1
2
1
0
1
2
8
5
1
3
5
0
1
4
3
0
1
5
1
0
1
5
9
0
1
6
7
0
1
7
5
0
1
8
5
0
1
9
5
0
2
0
5
0
2
1
5
0
2
2
5
0
1
4
7
0
1
5
7
0
1
6
9
0
1
8
3
0
2
0
0
0
2
2
0
0
2
4
0
0
2
6
0
0
0
0
0
0
0
0
0
0
1
8
0
0
1
8
8
0
1
9
6
0
2
1
0
0
2
3
0
0
2
4
8
0
2
7
0
0
2
9
0
0
0
0
0
0
0
0
0
0
2
2
2
5
2
4
0
0
2
5
2
5
2
6
5
0
2
8
0
0
2
9
5
0
3
1
5
0
3
3
5
0
0
0
0
0
0
0
0
0
2
7
6
0
2
8
8
0
3
0
0
0
3
1
0
0
3
2
0
0
3
3
1
0
3
4
3
0
3
5
5
0
0
0
0
0
0
0
0
0
3
1
9
0
3
2
7
0
3
3
5
0
3
4
2
0
3
4
9
0
3
5
9
0
3
7
1
0
3
8
3
0
0
0
0
0
0
0
0
0
3.5. QUANTIZINGTHELINESPECTRAL FREQUENCIES 43
% P3.4  lsp_condition.m
function [ freq ] = lsp_condition( freq,lastfreq,m )
% freq  LSP Frequencies of current frame
% lastfreq  LSP frequencies of last frame
% m  Order of LP polynomial
% TEST FOR ILLCONDITIONED CASES AND TAKE CORRECTIVE ACTION IF REQUIRED
freq = freq(1:m);
lspflag = 0;
if ( any(freq==0.00)  any(freq==0.5) )
lspflag = 1;
end
% REORDER LSPs IF NONMONOTONIC
for i = 2:m
if freq(i) < freq(i1)
lspflag = 1;
fprintf( 'pctolsp2: nonmonotonic lsps\n' );
tempfreq = freq(i);
freq(i) = freq(i1);
freq(i1) = tempfreq;
end
end
% IF NONMONOTONIC AFTER 1ST PASS, RESET TO VALUES FROM PREVIOUS FRAME
for i = 2:m
if freq(i) < freq(i1)
fprintf( 'pctolsp2: Reset to previous lsp values\n' );
freq(1:m) = lastfreq;
break;
end
end
Program P3.4: Testing cases of illcondition.
using 4 bits resulting in a total of 34 bits per frame. The LSFs are quantized such that the absolute
distance of each unquantized frequency with the nearest quantization level is minimal.
The ﬁrst LSF is quantized using the quantization levels given in the ﬁrst row of the matrix
given in Table 3.1, the second LSF using the second row and so on. Note that in some cases though
the second LSF is higher than the ﬁrst, it may be quantized to a lower value. This is because the
quantization tables overlap, i.e., the nonzero values in each of the columns of Table 3.1 are not
always in increasing order. Nevertheless, quantization is performed using the quantization table, and
the issue of quantized LSFs becoming possibly nonmonotonic is handled later. Also, it can be seen
fromTable 3.1, that the number of levels used for quantizing the second, third, fourth and ﬁfth LSFs
are 16, thereby needing 4 bits; whereas for the other LSFs, there are 9 quantization levels, hence
needing 3 bits each.
In Program P3.5, the program variable bits is initialized as,
44 3. LINESPECTRAL FREQUENCY COMPUTATION
bits = [3; 4; 4; 4; 4; 3; 3; 3; 3; 3] .
The quantization matrix given in Table 3.1 is stored in the program variable lspQ. FSCALE is the
sampling frequency, which is used to scale the LSFs to the appropriate range. For each LSF, the
number of bits used for quantization determines the number of levels. Quantization of each LSF is
performed such that the distance between the unquantized LSF and the corresponding quantization
level in the table is minimized. The index of the quantized LSF is stored in the program variable
findex.
% P3.5  lsp_quant.m
function [ findex ] = lsp_quant( freq, no, bits, lspQ )
% freq  Unquantized LSFs
% no  Number of LSFs (10 here)
% bits  Array of bits to be allocated for each LSF
% findex  Vector of indices to quantized LSFs, references lspQ
% lspQ  Quantization table matrix
% DEFINE CONSTANTS
FSCALE = 8000.00;
MAXNO = 10;
% INIT RETURN VECTOR
findex = zeros( MAXNO, 1 );
% INIT LOCAL VARIABLES
freq = FSCALE * freq;
levels = ( 2 .^ bits )  1;
% QUANTIZE ALL LSP FREQUENCIES AND FORCE MONOTONICITY
for i = 1:no
% QUANTIZE TO NEAREST OUTPUT LEVEL
dist = abs( freq(i)  lspQ(i,1:levels(i)+1) );
[ low, findex(i) ] = min( dist );
eQG
Program P3.5: Quantizing the LSFs.
Example 3.5 The LSFs are quantized using Program P3.5 to obtain indices of quantized LSFs
(findex) in the quantization table. Table 3.2 gives the input unquantized LSFs, unquantized fre
quencies, output indices (findex) and the quantized frequencies corresponding to those indices.
The spectral plots of the LP synthesis ﬁlters corresponding to the unquantized and quantized LSFs
are given in Figure 3.9. The corresponding polezero plots are given in Figure 3.10. Note that the
LSFs must be converted to LP coefﬁcients using Program P4.1 in order to visualize the spectra and
polezero plots.
3.6. ADJUSTINGQUANTIZATIONTOPRESERVEMONOTONICITY 45
Table 3.2: Example results for Program P3.5.
Unquantized Unquantized Index to Quantized
LSF Frequency Quantization Frequency
(Hz) Table (Hz)
0.0617 493.3 8 500
0.0802 641.3 13 670
0.0999 799.1 8 775
0.1612 1289.7 9 1270
0.1785 1428.0 7 1430
0.2128 1702.2 3 1690
0.2957 2365.9 5 2300
0.3141 2513.0 3 2525
0.3761 3009.1 3 3000
0.4049 3239.4 2 3270
Figure 3.9: Spectra from unquantized and quantized LSFs.
3.6 ADJUSTINGQUANTIZATIONTOPRESERVE
MONOTONICITY
Quantization of LSFs may lead to nonmonotonicity because of the overlapped quantization tables.
But, it is important to preserve the monotonicity of LSFs because if they become nonmonotonic,
the zeros of the two LSP polynomials will no longer be interlaced. This will lead to the loss of
minimum phase property of the LP polynomial. Therefore, the quantized LSFs are checked for
monotonicity, and if there is a case where the previous quantized LSF has a higher value than the
46 3. LINESPECTRAL FREQUENCY COMPUTATION
Figure 3.10: Polezero plots from unquantized and quantized LSFs.
current LSF, then the problem is corrected by either quantizing the current LSF to the next higher
level or quantizing the previous LSF to the next lower level.
Table 3.3: Example results for Program P3.6.
Unquantized Index to Nonmonotonic Corrected Index Monotonic
LSFs Quantization Quantized to Quantization Quantized
Table LSFs Table LSFs
0.0501 7 0.0525 7 0.0525
0.0512 7 0.0500 8 0.055
0.0942 8 0.0969 8 0.0969
0.1209 6 0.1212 6 0.1212
0.1387 3 0.1413 3 0.1413
0.2335 4 0.2288 4 0.2288
0.3103 6 0.3100 6 0.31
0.3214 3 0.3156 3 0.3156
0.4235 7 0.4288 7 0.4288
0.433 5 0.4363 5 0.4363
Program P3.6 takes unquantized LSFs (freq) and quantized LSF indices (findex) as the
ﬁrst two inputs. For each LSF, the programchecks whether the current quantized LSF is less than or
equal to the previous quantized LSF. If this is true, then the corresponding upward and downward
errors are calculated. Upward error is the error resulting if the current LSF were quantized to a
next higher value, with the previous LSF remaining at the same quantized level. Downward error is
the error resulting if the previous LSF were quantized to a next lower value, with the current LSF
remaining at the same quantized level. If the upward error is lower than the downward error, then
the current LSF is quantized to a next higher level. Else, the previous LSF is quantized to the next
3.6. ADJUSTINGQUANTIZATIONTOPRESERVEMONOTONICITY 47
% P3.6  lsp_adjust_quant.m
function [ findex ] = lsp_adjust_quant( freq,findex,no,lspQ,bits )
% freq  Unquantized LSFs
% findex  Vector of indices to quantized LSFs, references lspQ
% (May index nonmonotonic LSFs on input, but will index to
% monotonic LSFs on output)
% lspQ  Quantization table matrix
% no  Number of LSFs (10 here)
% bits  Array of bits to be allocated for each LSF
% DEFINE CONSTANTS
FSCALE = 8000.00;
MAXNO = 10;
% INIT LOCAL VARIABLES
freq = FSCALE * freq;
levels = ( 2 .^ bits )  1;
% QUANTIZE ALL LSP FREQUENCIES AND FORCE MONOTONICITY
for i = 1:no
% ADJUST QUANTIZATION IF NONMONOTONICALLY QUANTIZED AND
% FIND ADJUSTMENT FOR MINIMUM QUANTIZATION ERROR
if i > 1
if lspQ( i,findex(i) ) <= lspQ( i1,findex(i1) )
errorup = abs(freq(i) ...
lspQ(i,min(findex(i)+1,levels(i))))+ ...
abs( freq(i1)  lspQ(i1, findex(i1)) );
errordn = abs( freq(i)  lspQ(i,findex(i)) ) + ...
abs( freq(i1)  lspQ(i1,max(findex(i1)1,0)) );
% ADJUST INDEX FOR MINIMUM ERROR(AND PRESERVE MONOTONICITY)
if errorup < errordn
findex(i) = min( findex(i)+1, levels(i) );
while ( lspQ(i,findex(i)) < lspQ(i1,findex(i1)) )
findex(i) = min( findex(i)+1, levels(i) );
end
elseif i == 1
findex(i1) = max( findex(i1)1, 0 );
elseif lspQ(i1,max(findex(i1)1,0))>lspQ(i2,findex(i2))
findex(i1) = max( findex(i1)1, 0 );
else
findex(i) = min( findex(i)+1, levels(i) );
while lspQ( i, findex(i) ) < lspQ( i1, findex(i1) )
findex(i) = min( findex(i)+1, levels(i) );
end
end
end
end
end
Program P3.6: Adjusting quantization to preserve monotonicity.
48 3. LINESPECTRAL FREQUENCY COMPUTATION
lower level. If this operation introduces nonmonotonicity, then the current LSP is quantized to the
next higher value, regardless of the higher upward error.
Example 3.6 In this example, we will consider a case where the quantized LSFs are nonmonotonic
as observed in Table 3.3. If we provide the indices of the nonmonotonically quantized LSFs
(findex) shown in the second column of Table 3.3 as an input to Program P3.6, we get the indices
shown in fourth column as the output. The indices to the quantization table output from the pro
gram correspond to monotonic quantized LSFs that are given in the last column of Table 3.3. The
polezero plots of the LP synthesis ﬁlters corresponding to unquantized LSFs, quantized LSFs and
quantized LSFs corrected to be monotonic are given in Figure 3.11. The pole that crosses the unit
circle is zoomed in and it is clear that the nonmonotonicity of LSFs that occurs due to quantization
results in the pole crossing to the outside of the unit circle. When the LSFs are corrected to be
monotonic, the pole comes inside the unit circle again, making the system minimum phase.
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
0.94 0.95
0.305
0.31
0.315
0.94 0.96 0.98
0.3
0.32
0.34
0.93 0.94 0.95
0.32
0.33
0.34
PZ plot Irom
unquantized LSFs
PZ plot Irom
quantized LSFs
PZ plot Irom quantized
monotonic LSFs
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
0.94 0.95
0.305
0.31
0.315
0.94 0.96 0.98
0.3
0.32
0.34
0.93 0.94 0.95
0.32
0.33
0.34
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
1 0 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
0.94 0.95
0.305
0.31
0.315
0.94 0.96 0.98
0.3
0.32
0.34
0.93 0.94 0.95
0.32
0.33
0.34
PZ plot Irom
unquantized LSFs
PZ plot Irom
quantized LSFs
PZ plot Irom quantized
monotonic LSFs
Figure 3.11: Polezero plots from unquantized, quantized and monotonically quantized LSFs.
3.7. SUMMARY 49
3.7 SUMMARY
The LSP polynomials, P(z) and Q(z) are generated from the LP coefﬁcients. P(z) is a polynomial
with evensymmetric coefﬁcients and Q(z) is a polynomial with odd symmetric coefﬁcients. The
zeros of the polynomials, the LSFs, are computed using Descartes’ method. LSFs can be efﬁciently
computed using Chebyshev polynomials [63], and details on the spectral error characteristics and
perceptual sensitivity of LSFs are given in a paper by Kang and Fransen [64]. Developments in
enhancing the coding of the LSFs include using vector quantization [65] and discrete cosine trans
form[66] to encode the LSFs. Inthirdgenerationstandards, vector quantizationof the LSFs [67, 68]
is common practice. Note that illconditioned cases are indicated by the nonmonotonicity of the
LSFs. They could arise either due to a nonminimumphase LP polynomial or quantization of LSFs.
Immittance Spectral Pairs (ISPs) are another form of spectral representation for the LP ﬁlter [69]
similar to LSPs, and they are considered to represent the immitance of the vocal tract. Some of
the most modern speech coding standards such as the AMRWB codec and the sourcecontrolled
VMRWB codec use ISPs as parameters for quantization and transmission [5, 42].
51
C H A P T E R 4
Spectral Distortion
Spectral distortion in the FS1016 encoder occurs due to the quantization of LSFs. The quantized
and unquantized LSFs are converted to their respective LP coefﬁcients. The LP coefﬁcients are then
converted to the RCs using inverse LevinsonDurbin recursion. The RCs are ﬁnally converted back
to autocorrelation lags and the distances between the autocorrelation lags are computed. There are
many distance measures that could be used to characterize the distortion due to the quantization of
the LSFs. The following distance measures are considered in this chapter: (1) the cepstral distance
measure, (2) the likelihood ratio and (3) the cosh measure. The block diagram in Figure 4.1 shows
the steps involved in the computation of the spectral distortion measures.
Computation of the RCs from LP coefﬁcients using inverse LevinsonDurbin recursion has
been already explained in Section 2.5 (Program P2.5). Conversion of LSPs to LP coefﬁcients,
conversion of RCs to autocorrelation lags and calculation of distances between autocorrelation lags
will be illustrated in the following sections using MATLAB programs.
Figure 4.1: Spectral distortion computation.
52 4. SPECTRAL DISTORTION
4.1 CONVERSIONOF LSPTODIRECTFORM
COEFFICIENTS
Conversion of LSP frequencies to LP coefﬁcients is much less complex than ﬁnding the LSP
frequencies fromthe LPpolynomial. Each LSPfrequency ω
i
represents a polynomial of the form1 −
2 cos ω
i
z
−1
+ z
−2
[63]. Upon multiplying the secondorder factors, we get the symmetric and anti
symmetric polynomials P(z) and Q(z). Then the LP polynomial can be reconstructed using (3.3).
But here we adopt a different approach by realizing that each secondorder factor of the polynomial
P(z) and Q(z) can be considered as a secondorder ﬁlter, of the type shown in Figure 4.2. The
total ﬁlter polynomial is a cascade of many such secondorder stages. Because the ﬁlter is Finite
Impulse Response (FIR), the impulse response samples are the coefﬁcients of the ﬁlter. Therefore,
the coefﬁcients of the LP polynomials are found by computing the impulse response of the cascade
of the secondorder stages.
z
1
z
1
1
2cos(ω)
1
+
+ x
y
z
1
z
1
1
2cos(ω)
1
+
+ x
y
Figure 4.2: Secondorder section of P(z) and Q(z).
In Program P4.1, p and q are the arrays that contain the cosine of the LSP frequencies of
the polynomials P(z) and Q(z). The variables a and b are the arrays that contain the outputs of
the current stage and, consequently, the inputs for the next stage. The variables a
1
and b
1
are the
ﬁrst level memory elements that correspond to ﬁrstorder delays and a
2
and b
2
are the second level
memory elements that correspond to secondorder delays. The outer for loop cycles through the
samples and the inner for loop cycles through the stages of the cascaded ﬁlter for each input sample.
The latter calculates the output samples for each stage and populates the memory elements. Each LP
4.2. COMPUTATIONOFAUTOCORRELATIONLAGSFROMREFLECTIONCOEFFICIENTS 53
polynomial coefﬁcient, except the ﬁrst coefﬁcient, which is always one, is computed as the average
of the coefﬁcients of the two ﬁlters as per (3.3). Finally, the LP coefﬁcients are stored in the array
pc.
Example 4.1 The Program P4.1 takes the LSP frequencies as the input and generates the LP
coefﬁcients as the output. For the case of a tenthorder LP polynomial, quantized and unquantized
LSP frequencies (f and fq) supplied as input to the program and the corresponding LP coefﬁcients
(pc and pcq) output from the program are given below. The spectra and polezero plots of the LP
synthesis ﬁlters corresponding to unquantized and quantized LSFs are given in Figures 3.9 and 3.10,
respectively. The LP coefﬁcients (pc and pcq) are converted to RCs using Program P2.5, and the
corresponding RCs (rc and rcq) are given below.
f = [0.0617; 0.0802; 0.0999; 0.1612; 0.1785; 0.2128; 0.2957; 0.3141; 0.3761; 0.4049] ,
pc = [1.0000; −1.5921; 1.5702; −1.2870; 1.4251; −1.3094; 1.0557; −0.7447; 0.9385; −0.7386; 0.2440] ,
fq = [0.0625; 0.0838; 0.0969; 0.1588; 0.1787; 0.2112; 0.2875; 0.3156; 0.3750; 0.4088] ,
pcq = [1.0000; −1.6440; 1.6310; −1.2326; 1.2500; −1.1169; 0.9365; −0.6354; 0.7458; −0.5828; 0.1708] .
The LP coefﬁcients (pc and pcq) are also converted to RCs using Program P2.5 and the
corresponding RCs (rc and rcq) are also given below. Note that the absolute value of the elements
of the arrays rc and rcq are less than one, which implies that the LP synthesis ﬁlter is stable.
rc = [0.7562; −0.6992; 0.2369; −0.3205; 0.0473; −0.5316; −0.1396; −0.0367; 0.3723; −0.2440] ,
rcq = [0.7817; −0.7778; 0.2507; −0.3701; −0.0816; −0.4893; −0.0247; 0.0151; 0.3111; −0.1708] .
4.2 COMPUTATIONOF AUTOCORRELATIONLAGS FROM
REFLECTIONCOEFFICIENTS
To calculate the Autocorrelation (AC) lags from RCs [50], we ﬁrst obtain the LP polynomials of
increasing orders from the reﬂection coefﬁcients using forward and backward predictor recurrence
relations that are given in (4.1) and (4.2), respectively,
a
i+1
(j) = a
i
(j) + k
i+1
a
i
(i + 1 − j) (4.1)
a
i+1
(i + 1 − j) = a
i
(i + 1 − j) + k
i+1
a
i
(j) , (4.2)
where, a
i
(j) is the j
th
predictor coefﬁcient of order i and k
i
is the reﬂection coefﬁcient. In Pro
gram P4.2, initially the autocorrelation array r is stored with the reﬂection coefﬁcients rc. Then
54 4. SPECTRAL DISTORTION
% P4.1  lsptopc.m
function pc = lsptopc( f, no )
% f  LSP frequencies
% no  LPC filter order
% pc  LPC predictor coefficients
% INITIALIZE LOCAL VARIABLES
noh = no / 2;
freq = f;
a = zeros( noh+1, 1 );
a1 = a;
a2 = a;
b = a;
b1 = a;
b2 = a;
pc = zeros( no, 1 );
% INITIALIZE LSP FILTER PARAMETERS
p = 2 * cos( 2*pi*freq((1:2:no1)') );
q = 2 * cos( 2*pi*freq((2:2:no)') );
% COMPUTE IMPULSE RESPONSE OF ANALYSIS FILTER
xf = 0.00;
for k = 1:no+1
xx = 0.00;
if k == 1
xx = 1.00;
end
a(1) = xx + xf;
b(1) = xx  xf;
xf = xx;
Program P4.1: Conversion of LSP frequencies to LP coefﬁcients. (Continues.)
4.2. COMPUTATIONOFAUTOCORRELATIONLAGSFROMREFLECTIONCOEFFICIENTS 55
for i = 1:noh
a(i+1) = a(i) + ( p(i) * a1(i) ) + a2(i);
b(i+1) = b(i) + ( q(i) * b1(i) ) + b2(i);
a2(i) = a1(i);
a1(i) = a(i);
b2(i) = b1(i);
b1(i) = b(i);
end
if k ~= 1
pc(k1) = 0.5 * ( a(noh+1) + b(noh+1) );
end
end
% CONVERT TO PREDICTOR COEFFICIENT ARRAY CONFIGURATION
pc(2:no+1) = pc(1:no);
pc(1) = 1.0;
Program P4.1: (Continued.) Conversion of LSP frequencies to LP coefﬁcients.
a temporary array t is created to store the intermediate LP polynomial coefﬁcients of increasing
orders. If we have a predictor polynomial of order m, then we know that,
a
m
(m) = k
m
. (4.3)
In Program P4.2, we negate the reﬂection coefﬁcient and assign it to the intermediate LP
polynomial coefﬁcient because we want to follow the sign convention where the ﬁrst reﬂection
coefﬁcient equals the normalized ﬁrstorder autocorrelation lag. The LP polynomial coefﬁcients
are computed for increasing order of the intermediate predictor polynomials from 2 to m, where
m is the order of the actual predictor polynomial. The order of the intermediate LP polynomial is
indicated by the variable i in the program. For each intermediate predictor polynomial, the ﬁrst half
of the polynomial coefﬁcients are computed using forward predictor recursion given in (4.1) and
the second half of the polynomial coefﬁcients are computed using reverse predictor recursion given
in (4.2). The last coefﬁcient of any intermediate polynomial is equal to the negative of reﬂection
coefﬁcient. This follows from (4.3) and the negative sign is because of our sign convention.
Then, the autocorrelation lag of order i + 1 is computed by,
r(i + 1) =
i+1
k=1
a
i+1
(k)r(i + 1 − k) , (4.4)
where, r(i) is the autocorrelation at lag i. In Program P4.2, all the coefﬁcients of the intermediate
LP polynomial except the ﬁrst coefﬁcient are negatives of their actual values because of our sign
56 4. SPECTRAL DISTORTION
% P4.2  rctoac.m
function r = rctoac( rc, m )
% rc  Reflection coefficients
% m  Predictor order
% r  Normalized autocorrelation lags
% INITIALIZE LOCAL VARIABLES
r = zeros( m+1, 1 );
t = r;
r(1) = 1.0;
r(2:m+1) = rc;
% COMPUTE PREDICTOR POLYNOMIAL OF DIFFERENT DEGREE AND STORE IN T
% COMPUTE AUTOCORRELATION AND STORE IN R
t(1) = 1.0;
t(2) = r(2);
if m > 1
for i = 2:m
j = fix(i/2);
tj = t( 2:j+1 )  ( r( i+1 ) * t( i:1:ij+1 ) );
tkj = t( i:1:ij+1 )  ( r( i+1 ) * t( 2:j+1 ) );
t( 2:j+1 ) = tj;
t( i:1:ij+1 ) = tkj;
t( i+1 ) = r( i+1 );
r( i+1 ) = r( i+1 )  sum( t( 2:i ) .* r( i:1:2 ) );
end
end
Program P4.2: Computation of autocorrelation lags from reﬂection coefﬁcients.
convention. Therefore, when (4.4) is implemented in the program, the ﬁnal sum is negated to get
the autocorrelation lag with proper sign.
Example 4.2 The autocorrelation lags that are output when the RCs in Example 4.1 are input to
ProgramP4.2 are givenbelow.The array accontains the autocorrelationlags corresponding to rcand
acq contains the autocorrelation lags corresponding to rcq. We will be using these autocorrelation
lags to calculate various distance measures that quantify the amount of distortion that has occurred
due to quantization.
ac = [1; 0.7562; 0.2725; −0.1267; −0.3443; −0.4001; −0.4154; −0.4701; −0.4573; −0.2241; 0.1523] ,
acq = [1; 0.7817; 0.3086; −0.1406; −0.4277; −0.5416; −0.5576; −0.5160; −0.3639; −0.0506; 0.3324] .
4.3. CALCULATIONOF DISTANCEMEASURES 57
4.3 CALCULATIONOF DISTANCEMEASURES
The distance measures used to measure the distortion due to quantization of LSP frequencies are the
likelihood ratio, the cosh measure and the cepstral distance [70]. Nine different distances are stored
in an array for each frame. In Program P4.3 (dist.m), the array of distance measures is indicated
by dm.
Considering the log likelihood distance measure, let us deﬁne A(z) as the LP polynomial that
corresponds to the unquantized LSFs. The LPpolynomial that would correspond to quantized LSFs
is denoted by A
(z). The minimum residual energy α would be obtained if the speech samples of the
current frame, say {x(n)}, used to generate the LP polynomial A(z) is passed through the same ﬁlter.
The residual energy δ, is obtained when {x(n)} is passed through the ﬁlter A
(z). Consequently, if
we assume that some sequence of samples {x
(n)} was used to generate A
(z), then α’ would be the
minimum residual energy found when {x
(n)} was passed through A
(z). The residual energy δ
is
obtained when {x
(n)} is passed through A(z). The ratios δ
/α
and δ/α are deﬁned as the likelihood
ratios and indicate the differences between the LPC spectra before and after quantization.
The program cfind.m is used by dist.m to compute the cepstral coefﬁcients, the ﬁlter
autocorrelation lags, the residual energy, the LP ﬁlter parameters and the reﬂection coefﬁcients,
using the autocorrelation sequence. The method of ﬁnding likelihood ratios is described in [70]. In
Program P4.3, the δ, δ
, α and α
are denoted by del, delp, alp, and alpp. The ﬁrst two distance
measures dm(1) and dm(2) denote the ratios δ/α and δ
/α
.
We denote,
=
1
2
(σ/σ
)
2
(δ/α) +
1
2
(σ
/σ)
2
(δ
/α
) − 1 , (4.5)
cosh(ω) − 1 = , (4.6)
ω = ln[1 + +
_
(2 + )] , (4.7)
where ω is the cosh measure. The array element dm(3) is the distance measure 4.34294418ω, where
ω is calculated for σ = σ
= 1 and the factor 4.34294418 is used to convert the cosh measure to
decibels. The program fin.m converts to the cosh measure dm(6). The parameters σ and σ
used
above are the gain constants related to the cepstral gains by,
c
0
= ln[σ
2
] , (4.8)
c
0
= ln[(σ
)
2
] , (4.9)
where the cepstral coefﬁcients c
k
and c
k
are the Fourier series coefﬁcients for the log spectra given
by,
ln[σ
2
/A(e
jθ
)
2
] =
k=∞
k=−∞
c
k
e
−jkθ
, (4.10)
ln[(σ
)
2
/A
(e
jθ
)
2
] =
k=∞
k=−∞
c
k
e
−jkθ
. (4.11)
58 4. SPECTRAL DISTORTION
% P4.3  dist.m
function [ dm ] = dist( m, l, r, rp)
% m  Filter order
% l  Number of terms used in cepstral distance
% Measure
% r  Autocorrelation sequence 1 (undistorted)
% rp  Autocorrelation sequence 2 (distorted)
% dm  Distances array
% DEFINE LOCAL CONSTANTS
DBFAC = 4.342944819;
% INITIALIZE LOCAL VARIABLES AND RETURN VECTORS
dm = zeros( 9, 1 );
% COMPUTE CEPSTRAL AND FILTER COEFFICIENTS
[ c, ra, alp, a, rc ] = cfind( m, l, r );
[ cp, rap, alpp, ap, rcp ] = cfind( m, l, rp );
% COMPUTE DM(0), DM(1)
del = ( r(1) * rap(1) ) + sum( 2 * r(2:m+1) .* rap(2:m+1) );
delp = ( rp(1) * ra(1) ) + sum( 2 * rp(2:m+1) .* ra(2:m+1) );
dm(1) = del / alp;
dm(2) = delp / alpp;
% COMPUTE DM(3)
q = ( ( dm(1) + dm(2) ) / 2.0 )  1;
if q >= 0.00
dm(3) = fin(q);
end
% COMPUTE DM(4)
q1 = ( alpp * r(1) ) / ( alp * rp(1) );
q = ( ( ( dm(1) / q1 ) + dm(2) * q1 ) * 0.5 )  1.0;
if q >= 0.00
dm(4) = fin(q);
end
Program P4.3: Computation of distance measures. (Continues.)
4.3. CALCULATIONOF DISTANCEMEASURES 59
% COMPUTE DM(5)
q2 = alpp / alp;
q = ( ( ( dm(1) / q2 ) + dm(2) * q2 ) * 0.5 )  1.0;
if q >= 0.00
dm(5) = fin(q);
end
% COMPUTE DM(6)
q = sqrt( dm(1) * dm(2) )  1.0;
qflag = ( q >= 0.00 );
if qflag
dm(6) = fin(q);
end
% COMPUTE DM(7), DM(8), DM(9)
cepsum = 2 * sum( ( c(1:l)  cp(1:l) ) .^ 2 );
if cepsum >= 0.0
dm(7) = DBFAC * sqrt( cepsum );
q = log(q1);
dm(8) = DBFAC * sqrt( cepsum + (q * q) );
q = log(q2);
dm(9) = DBFAC * sqrt( cepsum + (q * q) );;
else
qflag = FALSE;
end
Program P4.3: (Continued.) Computation of distance measures.
To obtain the fourth distance measure dm(4), we consider the following values for σ
2
and
(σ
)
2
,
σ
2
= α/r(0), (4.12)
(σ
)
2
= α
/r
(0), (4.13)
where, r
indicates the autocorrelation lag corresponding to the quantized LSP frequencies. For
dm(5), we take σ
2
= α and (σ
)
2
= α
. For the sixth distance measure, dm(6), the gain constants
are adjusted to minimize and the minimum is obtained as,
min
= [(δ/α)(δ
/α
)]
1/2
− 1 . (4.14)
The cepstral distance measure u(L) is given by,
[u(L)]
2
= (c
0
− c
0
)
2
+ 2
L
k=1
(c
k
− c
k
)
2
, (4.15)
60 4. SPECTRAL DISTORTION
where, L is the number of cepstral coefﬁcients. The distance measure dm(7), is the cepstral distance
calculated with σ = σ
= 1, dm(8) is the cepstral measure obtained with (4.12) and (4.13) holding
true, and ﬁnally, dm(9) is the cepstral measure with the conditions σ
2
= α and (σ
)
2
= α
. The
cepstral distance is a computationally efﬁcient way to obtain the rootmeansquare (RMS) difference
between the log LPC spectra corresponding to the unquantized and quantized LSFs.
Example 4.3 The autocorrelation lags from Example P4.2 are used in Program P4.3 to ﬁnd the
distance measures. The order of the ﬁlter (m) is taken as 10 and the number of terms used to ﬁnd the
cepstral distance measure (l) is taken as 40. The distance measure array output from the program is
given below.
dm = [1.0477; 1.0452; 1.3189; 1.7117; 1.7117; 1.3189; 1.2890; 1.6750; 1.6750] .
The cepstral distance dm(9) and likelihood ratios dm(1) and dm(2) for all frames from the
speech ﬁle ‘cleanspeech.wav’ [76] are given in Figure 4.3. The frame energy and the spectrogram
of the speech is also given in the same ﬁgure. The three distance measures follow the same trend
over most of the frames, which is reasonable because of the fact that all three of them measure
the distortion of spectra corresponding to quantized and unquantized LSFs. Note that no logical
relationship can be drawn between the distance measures and the frame energy or the spectrogram
of the speech signal.
4.4 SUMMARY
Procedures for computing three distinct types of spectral distance measures were presented in this
chapter. These distance measures have a meaningful frequency domain interpretation [70]. The
signiﬁcance of the likelihood distance measures when the data are assumed to be Gaussian and their
efﬁcient computation using autocorrelation sequences are provided in a paper by Itakura [71]. The
properties of cepstral coefﬁcients from minimum phase polynomials [72] are exploited to compute
the cepstral distance measure, which provides a lower bound for the rms log spectral distance. The
cosh distance measure closely approximates the rms log spectral distance as well and provides an
upper bound for it [70].
4.4. SUMMARY 61
Figure 4.3: (a) Cepstral distance for all the frames.
Figure 4.3: (b) Likelihood ratio distance measure at all the frames.
62 4. SPECTRAL DISTORTION
Figure 4.3: (c) Spectrogram at all the frames.
Figure 4.3: (d) Frame energy at all the frames.
63
C H A P T E R 5
The Codebook Search
The prediction residual from the LP ﬁlter of the CELP analysis stage is modeled by an LTP and
the Stochastic Codebook (SCB). The pitch delay of the LP residual is ﬁrst predicted by the LTP
and the SCB represents the random component in the residual. The LTP is also called as the
Adaptive Codebook (ACB) because the memory of the LTP is considered as a codebook and the
best matching pitch delay for a particular target is computed. Figure 5.1 provides an overview of the
codebook search procedure in the analysis stage consisting of both the ACB and the SCB.
The codebook search procedure outlined in Figure 5.1 will be described in detail with the nec
essary theory and MATLAB programs in the following sections. Demonstration of some important
details will also be provided with some numerical examples and plots.
Figure 5.1: Codebook search procedure.
5.1 OVERVIEWOF CODEBOOKSEARCHPROCEDURE
The use of the closedloop LTP was ﬁrst proposed in a paper by Singhal and Atal [2] and led to a
big improvement in the quality of speech. The vector excitation scheme using stochastic codevectors
for AbyS LP coding was introduced in a paper by Schroeder and Atal [1]. In this section, we give
an overview of the structure of ACB and SCB and the search procedures. The codebook searches
are done once per subframe for both the ACB and the SCB, using the LP parameters obtained
from the interpolated LSPs of each subframe. Determining the best matching codevector for a
particular signal is referred to as Vector Quantization (VQ) [73, 74]. The idea behind both the
codebook searches is to determine the best match in terms of the codevector and the optimal gain
64 5. THECODEBOOKSEARCH
(gainshape VQ[75]) so that the codevector is closest to the target signal in the MSEsense. Because
of this reason, the theory and most of the MATLAB implementation of both the ACB and the
SCB searches are similar. The major difference between the search procedures is that the target
signal used in the process of error minimization is different. The target for the ACB search is the LP
residual weighted by the perceptual weighting ﬁlter, A(z/γ ), where γ is the bandwidth expansion
factor. The target for the SCB search is the target for the ACB search minus the ﬁltered ACB
VQ excitation [20]. Therefore, it is clear that the combination of the ACB and the SCB is actually
attempting to minimize the perceptually weighted error. The entire search procedure is illustrated in
Figure 5.2. i
a
and i
s
are the indices to the ACBand SCBthat correspond to the optimal codevectors;
g
a
and g
s
are the gains that correspond to the adaptive and stochastic codewords. In the MATLAB
implementation, the ACB search is performed ﬁrst and then the SCB search is performed.
512
511
2
1
.
.
.
256
255
2
1
.
.
.
g
s
g
a
LP
Synthesis
Filter
+ +
Perceptual
Weighting
Filter
Minimize
Perceptual
Error Adaptive Codebook
Stochastic Codebook
i
a
i
s
s
ŝ

e
i
a
i
s
g
s
g
a
512
511
2
1
.
.
.
512
511
2
1
.
.
.
256
255
2
1
.
.
.
256
255
2
1
.
.
.
g
s
g
a
LP
Synthesis
Filter
+ +
Perceptual
Weighting
Filter
Minimize
Perceptual
Error Adaptive Codebook
Stochastic Codebook
i
a
i
s
s
ŝ

e
i
a
i
s
g
s
g
a
Figure 5.2: Searching the adaptive and the stochastic codebooks.
For the ACB, the codebook consists of codewords corresponding to 128 integer and 128 non
integer delays ranging from 20 to 147 samples. The search involves a closedloop analysis procedure.
Odd subframes are encoded with an exhaustive search requiring 8 bits whereas even subframes
are encoded by a delta search procedure requiring 6 bits. Therefore, the average bits needed for
encoding the ACB index is 7. Codebook search here means ﬁnding the optimal pitch delay and an
associated gain that is coded between −1.0 and +2.0 using a modiﬁed MinimumSquared Prediction
Error (MSPE) criterion. Submultiple delays are also checked using a submultiple delay table to
check if the submultiples match within 1 dB of MSPE. This favors a smooth pitch contour and
prevents pitch errors due to choosing of multiples of pitch delay. The adaptive codebook gets updated
whenever the pitch memory of the LTP is updated, i.e., once every subframe.
5.1. OVERVIEWOF CODEBOOKSEARCHPROCEDURE 65
For the ﬁxed SCB, the codebook is overlapped with a shift of 2 samples between consecutive
codewords. The codebook is 1082 bits long and consists of 512 codewords that are sparse and ternary
valued (−1, 0 and +1) [18, 46]. This allows for fast convolution and end correction of subsequent
codewords, which will be described later in detail. A sample of two consecutive codewords shifted
by −2 found in the stochastic codebook is given in Figure 5.3 where the overlap, sparsity and ternary
values could be seen.
Figure 5.3: Examples of ternary valued SCB vectors.
As we mentioned earlier, the search procedures for both the ACB and the SCB are essentially
the same, except that the targets are different. Assuming that we have the L × 1 target vector e
(0)
and codevectors x
(i)
, where i is the index of the codevector in the codebook, the goal is to ﬁnd the
index for which the gain score g
(i)
and the match score m
(i)
are maximized. The ﬁltered codeword
is given by,
y
(i)
= WHx
(i)
, (5.1)
where W and H are the L × L lower triangular matrices that represent the truncated impulse
responses of the weighting ﬁlter and the LP synthesis ﬁlter, respectively. The overall product in (5.1)
represents the effect of passing the codeword through the LP and the weighting ﬁlters. Figure 5.4
(adapted from [20]) illustrates the procedure to ﬁnd the gain and the match scores. The scores are
given by,
g
(i)
=
y
(i)T
e
(0)
y
(i)T
y
(i)
, (5.2)
m
(i)
=
(y
(i)T
e
(0)
)
2
y
(i)T
y
(i)
. (5.3)
66 5. THECODEBOOKSEARCH
.
.
.
*
x
/
Cascaded LP
and weighting
filters
Energy
Correlate
i
x
(i)
y
(i)
e
(0)
y
(i)T
e
(0)
y
(i)T
y
(i)
m
(i)
g
(i)
Codebook
.
.
.
*
x
/
Cascaded LP
and weighting
filters
Energy
Correlate
i
x
(i)
y
(i)
e
(0)
y
(i)T
e
(0)
y
(i)T
y
(i)
m
(i)
g
(i)
Codebook
Figure 5.4: Finding gain and match scores [From [20]].
5.2 ADAPTIVECODEBOOKSEARCH
Having looked at the match scores and the overview of codebook searches, we will now describe the
ACB search in detail with MATLAB code. We will provide an elaborate study of the target signal
for the ACB search, the integer delay search, and the submultiple/fractional delay search.
5.2.1 TARGETSIGNAL FORADAPTIVECODEBOOKSEARCH
Denoting s and ˆ s
(0)
as the actual speech subframe vector and the zero input response of the LP
ﬁlter, from Figure 5.2, we can see that the target signal for the ACB search, e
(0)
, should be,
e
(0)
= W(s − ˆ s
(0)
) . (5.4)
The ﬁltered codeword y
(i)
given in (5.1) is compared with the target e
(0)
to get the gain and match
scores as given in (5.2) and (5.3). The best codebook index i is the one that maximizes m
i
.
The MATLAB implementation of the codebook search described above takes into account
certain considerations in order to reduce the implementation complexity. From (5.1), we see that
the codeword has to be ﬁltered through the LP ﬁlter and the weighting ﬁlter for comparison with
the target. One possible simpliﬁcation is to do away with the LP ﬁltering, so that we deal with the
excitation signal instead of speech signal. Therefore, instead of using the weighting ﬁlter A(z/γ ), we
use the inverse weighting ﬁlter 1/A(z/γ ). The modiﬁed ﬁltered codeword for comparison becomes,
˜ y
(i)
= W
−1
x
(i)
. (5.5)
5.2. ADAPTIVECODEBOOKSEARCH 67
This requires us to modify the target (converting it to a form of excitation) and the gain and match
scores as well. These are given by,
˜ e
(0)
=W
−1
H
−1
(s − ˆ s
(0)
) , (5.6)
˜ g
i
=
˜ y
(i)T
˜ e
(0)
˜ y
(i)T
˜ y
(i)
, (5.7)
˜ m
i
=
(˜ y
(i)T
˜ e
(0)
)
2
˜ y
(i)T
˜ y
(i)
. (5.8)
% P5.1  acbtarget.m
% s  Speech or residual segment
% l  Segment size
% d2  Memory, 1/A(z)
% d3  Memory, A(z)
% d4  Memory, 1/A(z/gamma)
% no  LPC predictor order
% e0  Codebook search initial error(zero vector) and updated
% error(target)
% fc  LPC filter/weighting filter coefficients
% gamma2  Weight factor, perceptual weighting filter
% LOAD THE SPEECH FRAME, FILTER MEMORIES AND OTHER PARAMETERS
load voic_fr
% INITIALIZE LOCALS
fctemp = zeros( no+1, 1 );
% COMPUTE INITIAL STATE, 1/A(z)
[ d2, e0 ] = polefilt( fc, no, d2, e0, l );
% RECOMPUTE ERROR
e0 = s  e0;
% COMPUTE INITIAL STATE, A(z)
[ d3, e0 ] = zerofilt( fc, no, d3, e0, l );
% COMPUTE INITIAL STATE, 1/A(z/gamma)
fctemp = bwexp( gamma2, fc, no );
[ d4, e0 ] = polefilt( fctemp, no, d4, e0, l );
Program P5.1: Target for the ACB.
The error is initialized to zero, and then it is passed as input to the LP synthesis ﬁlter 1/A(z).
This gives the Zero Input Response (ZIR) i.e., the initial condition, of the LP synthesis ﬁlter.
Then the ZIR is subtracted from the actual speech subframe and passed through the LP analysis
ﬁlter, which converts it back to the residual. Since we work with prediction residuals now, the inverse
68 5. THECODEBOOKSEARCH
weighting ﬁlter 1/A(z/γ ) is used to obtain the ﬁnal target vector for the ACB, i.e., ˜ e
(0)
. ProgramP5.1
performs this operation, but it needs supporting ﬁles and workspace [76].
Example 5.1 A test speech segment (voiced) is taken and tested with Program P5.1. The segment
of speech and the target are shown in Figure 5.5. Note that this target vector will be used by the
ACB to predict the pitch period.
Figure 5.5: Speech segment and target for the ACB.
5.2.2 INTEGERDELAY SEARCH
Once the target signal is generated, the next step is to search for the integer pitch delay. In the
MATLAB implementation, the pitch is coded in two stages – searching the integer delays ﬁrst and
the neighboring noninteger delays next. In this section, we will look into the coding of integer pitch
delays.
The MATLABprogramthat implements the integer pitch delay (integerdelay.m) is avail
able under the directory P5_2 in the website [76]. Initially, the excitation vector (v0) is loaded with
a copy of the pitch memory (d1b), and allowable search ranges are ﬁxed in the variables minptr
and maxptr. If the subframe is odd, then a full search of the pitch delay table (pdelay) is done.
If the subframe is even, the pitch delay is searched between −31 to 32 of the pitch index in the
pitch delay table. The pitch delay table has both integer and fractional pitch values. In the integer
delay search, only the integer delays are searched and match score is set to zero, if the pitch delay is
fractional.
The gain and match scores are computed as given in (5.7) and (5.8) in the MATLAB code
pgain.m and portions of it are provided inProgramP5.3.The full MATLABcode for computing the
5.2. ADAPTIVECODEBOOKSEARCH 69
% RUN GAIN COMPUTATIONS
if first == 1
% CALCULATE AND SAVE CONVOLUTION OF TRUNCATED (TO LEN)
% IMPULSE RESPONSE FOR FIRST LAG OF T (=MMIN) SAMPLES:
%
% MIN(i, len1)
% y = SUM h * ex , WHERE i = 0, ..., L1 POINTS
% i, t j=0 j ij
%
% h 0 1...len1 x x
% ex L1 . . . 1 0 = y[0]
% ex L1 . . . 1 0 = y[1]
% : :
% ex L1 . . . 1 0 = y[L1]
for i = 0:l1
jmax = min( i, len1 );
Ypg(i+1) = sum( h(1:jmax+1) .* ex(i+1:1:ijmax+1) );
end
else
% END CORRECT THE CONVOLUTION SUM ON SUBSEQUENT PITCH LAGS
% y = 0
% 0, t
% y = y + ex * h WHERE i = 1, ..., L POINTS
% i, m i1, m1 m i AND m = t+1, ..., tmax LAGS
Ypg(len1:1:1) = Ypg(len1:1:1) + ( ex(1) * h(len:1:2) );
Ypg(l:1:2) = Ypg(l1:1:1);
Ypg(1) = ex(1) * h(1);
end
% FOR LAGS (M) SHORTER THAN FRAME SIZE (L), REPLICATE THE SHORT
% ADAPTIVE CODEWORD TO THE FULL CODEWORD LENGTH BY OVERLAPPING
% AND ADDING THE CONVOLUTION
y2(1:l) = Ypg(1:l);
if m < l
% ADD IN 2ND CONVOLUTION
y2(m+1:l) = Ypg(m+1:l) + Ypg(1:lm);
if m < fix(l/2)
% ADD IN 3RD CONVOLUTION
imin = ( 2 * m ) + 1;
imax = l;
y2( imin:imax ) = y2( imin:imax ) + Ypg( 1:l(2*m) );
end
end
Program P5.3: Gain and match scores computation. (Continues.)
70 5. THECODEBOOKSEARCH
% CALCULATE CORRELATION AND ENERGY
cor = sum( y2(1:l) .* e0(1:l) );
eng = sum( y2(1:l) .* y2(1:l) );
% COMPUTE GAIN AND MATCH SCORES
if eng <= 0.0
eng = 1.0;
end
Pgain = cor / eng;
match = cor * Pgain;
Program P5.3: (Continued.) Gain and match scores computation.
gain and match scores are available in the website under the directory P5_3 [76]. In this program, Ypg
is the variable that stores the ﬁltered codeword. The ﬁltered codeword is obtained by convolving
the excitation (ex) with the truncated impulse response (h) of the inverse weighting ﬁlter. The
convolution is shown in the ﬁrst if condition in Program P5.3. Full convolution is performed only
when the ﬁrst delay is tested. Because of the integer delay, the excitation signal for the second delay
is simply shifted by one from the excitation signal for the ﬁrst delay. Exploiting this, we can perform
just end corrections from the second convolution onwards and reduce the number of computations.
This is illustrated in the ﬁrst else condition in Program P5.3. The newly arrived ﬁrst sample in
the excitation is multiplied with the timereversed impulse response and added to the previous
convolution result. The new result is restricted to the length of the excitation signal (60).
If the pitch delay (M) is greater than the subframe length (60), then the codeword is con
structed by taking the previous excitation signal and delaying it by M samples. If M is less than
60, then the short vector obtained by delaying the previous excitation signal is replicated by peri
odic extension. In Program P5.3, the effect of periodic extension is provided by overlapadd of the
convolution result. Once the ﬁltered codeword is obtained, the gain and match scores are computed
exactly in the same way as in (5.7) and (5.8). For even subframes the same procedure is repeated
except that the search range is limited.
Example 5.2 Integer delay search is performed for the speech segment given in Example 5.1. The
subframe is odd; therefore, all the integer delays are searched. The gain and match scores with
respect to the indices of the pitch delay table are plotted as shown in Figure 5.6. The maximum
index is 117 and the actual pitch period is 56. This is obtained by running the MATLAB program
integerdelay.m.
5.2.3 SUBMULTIPLE/FRACTIONAL DELAY SEARCH
After performing an integer delay search, 2, 3, and 4 submultiple delays are searched according to
the submultiple delay table (submult), in order to ensure that the pitch contour is smooth. Note
5.2. ADAPTIVECODEBOOKSEARCH 71
Figure 5.6: Gain and match scores for ACB search in a voiced segment.
that only the submultiples speciﬁed in the submultiple delay table are searched. Submultiple pitch
delay is selected if the match is within 1dB of MSPE of the actual pitch delay already chosen.
We will now describe fractional pitch estimation, which involves the interpolation of the
codevectors (that correspond to integer delay) chosen from the pitch memory (ACB). After the
interpolation is performed and codevectors corresponding to the fractional delay are found, the
gain and match scores are determined as shown before (pgain.m). The interpolating function is
the product of a hamming window (h
w
), and a sinc function and totally 8 points are used for
interpolation. If Dfractional samples are to be computedbetweentwo integer samples, it is equivalent
to increasing the sampling frequency by D. Therefore, the signal is upsampled and lowpass ﬁltered,
which increases the number of samples by a factor of D. Then, the oversampled signal is delayed
by an integral amount equivalent to the fractional delay of the original signal. Again, the signal is
downsampled to the actual sampling rate. The entire process can be replicated using only windowed
sinc interpolation in the time domain [17]. The window and the windowed interpolation functions
are given as,
h
w
(k) = 0.54 + 0.46 cos
_
kπ
6N
_
, (5.9)
where N = interpolation length (even) k = −6N, −6N + 1, . . . , 6N and
w
f
(j) = h(12(j + f ))
sin((j + f )π)
(j + f )π
, (5.10)
72 5. THECODEBOOKSEARCH
where j =
−N
2
,
−N
2
+ 1, . . . ,
N
2
− 1, f =
1
4
,
1
3
,
1
2
,
2
3
,
3
4
(fractional part).
Program P5.4 (fracdelay.m) implements fractional delaying of an input signal and it is
available under the directory P5_4 in the website [76]. After computing the windowed sinc function,
the input signal is multiplied with the windowed sinc and interpolated to ﬁnd the signal delayed
by fractional amount. The delayed excitation signal is used in the next step to ﬁnd the gain and the
match score.
The integer and fractional parts of the delay are computed and the optimal excitation vector
with the highest match score is found. This excitation vector is used to update the pitch memory, so
that the codebook search for the next subframe uses the updated codebook.
Example 5.3 The windowed interpolating functions (8 points) for the fractional delays for 1/4
and 3/4 are shown in Figure 5.7. A signal and its delayed version by a delay of 1/3 are shown in
Figure 5.8. These ﬁgures are generated from the program fracdelay.m.
5.3 STOCHASTICCODEBOOKSEARCH
The SCBsearch is similar to the ACBsearch except that the SCBis stored as overlapped codewords,
and the target signal for the SCBsearch is different to that of the ACBsearch. The following sections
describe the generation of target signals for SCB search and the actual search procedure.
5.3.1 TARGETSIGNAL FORSTOCHASTICCODEBOOKSEARCH
The SCBsearchis done on the residual signal after the ACBsearchidentiﬁes the periodic component
of the speech signal. If u is the codeword obtained from the ACB search, the target for the SCB
search is,
e
(0)
= W(s − ˆ s
(0)
) − WHu . (5.11)
The error target for the SCB is the error target for the ACB subtracted from the ﬁltered codeword
obtained after the ACBsearch. In the MATLABimplementation, instead of operating directly with
the error target, the target signal is converted into a form of excitation as done with the ACB search.
Therefore, the modiﬁed ﬁltered codeword for comparison is same as the one given in (5.5). In this
case, x
(i)
, is the i
th
SCB codeword used for comparison. The modiﬁed target is given by,
˜ e
(0)
= W
−1
H
−1
W
−1
e
(0)
, (5.12)
which is the same as,
˜ e
(0)
= W
−1
H
−1
(s − ˆ s
(0)
) − W
−1
u . (5.13)
The gain and match scores are also the same as given in Equations (5.7) and (5.8). The pitch synthesis
routine that generates u from the pitch delay and gain parameters obtained from the ACB search is
given in the Program P5.5 (pitchsynth.m) available under the directory P5_5 in the website [76].
5.3. STOCHASTICCODEBOOKSEARCH 73
Figure 5.7: Windowed sinc function for interpolation.
Figure 5.8: Input signal and its delayed version by 1/3 sample.
74 5. THECODEBOOKSEARCH
The variable buf is the memory that contains the past excitation. The array b contains the
pitch delay and the pitch predictor coefﬁcients. The integer and fractional parts of the delay are
obtained from the pitch delay. If the integer part of pitch delay (M
i
) is greater than or equal to the
frame size (60 samples), the current excitation in the pitch memory is updated with the M
i
samples
of the past excitation stored in the pitch memory. If M
i
is less than 60 samples, the pitch memory
is updated by extending the past excitation to 60 samples. Once the integer delaying is performed,
fractional delay is implemented by the program ldelay.m. The current ACB excitation is now
present in buf, which is scaled by the pitch gain to obtain the scaled ACB excitation vector u and
stored in rar. For M > 60, this procedure is equivalent to doing pitch synthesis using a longterm
predictor with delay M and appropriate gain.
% P5.6  scbtarget.m
% s  Speech or residual segment
% l  Segment size
% d1  Memory, 1/P(z)
% d2  Memory, 1/A(z)
% d3  Memory, A(z)
% d4  Memory, 1/A(z/gamma)
% fctemp  Bandwidth expanded poles, 1/A(z/gamma)
% idb  Dimension of d1a and d1b (209)
% no  LPC predictor order (10)
% bb  Pitch predictor coefficients (array of 4)
% e0  Codebook search initial error and updated error
% fc  LPC filter/weighting filter coefficients
% gamma2  Weight factor, perceptual weighting filter
load scbvect;
% INITIALIZE LOCALS
fctemp = zeros( MAXNO+1, 1 );
% COMPUTE INITIAL STATE, 1/P(z)
[ e0, d1 ] = pitchvq( e0, l, d1, idb, bb, 'long' );]
% COMPUTE INITIAL STATE, 1/A(z)
[ d2, e0 ] = polefilt( fc, no, d2, e0, l );
Program P5.6: Generating the SCB target vector.
After generating u, the target vector ˜ e
(0)
needs to be computed. This is performed in Pro
gram P5.6 (scbtarget.m). The function pitchvq.m is very similar to Program P5.5 and generates
the vector u. Filtering u with 1/A(z) is the same as computing Hu + ˆ s
(0)
, where ˆ s
(0)
is the ZIR
of 1/A(z). This is subtracted from s to obtain the residual s − ˆ s
(0)
− Hu. This residual is passed
5.3. STOCHASTICCODEBOOKSEARCH 75
through the LP analysis ﬁlter A(z) and the inverse weighting ﬁlter 1/A(z) to yield the vector
W
−1
H
−1
(s − ˆ s
(0)
) − W
−1
u, which is the same as ˜ e
(0)
, the error target, in (5.13).
Example 5.4 The scaled ACB excitation is generated for the speech segment in Example 5.1 are
computed using Program P5.5. The workspace variable pitsyn contains the necessary pitch delay
and gain parameters needed by the program. The excitation is plotted in Figure 5.9.
Figure 5.9: The scaled ACB excitation.
Example 5.5 The SCB error target is generated using Program P5.6 and is shown in Figure 5.10.
Note that the target is given by ˜ e
(0)
and not e
(0)
. Therefore, the target displayed has the form of
excitation and needs to be ﬁltered with 1/A(z), if we want to see the actual error target e
(0)
. This
could be easily determined using the function polefilt.m.
5.3.2 SEARCHFORBESTCODEVECTOR
The SCB search is done with the modiﬁed error target signal. This is very similar to the ACB search
except that the SCB codebook is ﬁxed and has some structure that could be exploited to perform
fast convolutions. Details on the structure of the SCB are given in Section 5.1.
Program P5.7 (cgain.m) computes the gain and matches scores for a particular SCB ex
citation. It is available in the directory P5_7 in the website [76]. For the ﬁrst search alone, a full
convolution is performed to compute Ycg, the ﬁltered codeword. Filtering is performed by convolv
ing the excitation (ex) with the truncated impulse response (h) of the inverse weighting ﬁlter. The
76 5. THECODEBOOKSEARCH
Figure 5.10: Example of a modiﬁed SCB target signal.
full convolution is performed in the ﬁrst if structure in the program. Since the subsequent code
words are shifted by −2, the else structure corrects the previous convolution result individually
for the two shifts. The gain and match scores are then computed using the ﬁltered codeword. This
process is repeated for all possible codewords, and the one with the best match score is chosen. The
gain and match scores are recomputed with the chosen codeword to correct any errors accumulated
during the process. The selected codeword is scaled with the quantized gain in order to get the SCB
excitation.
After determining the SCB excitation, the ﬁlter states are updated using a procedure similar
to the one followed in Program P5.6. The difference is that the error ˜ e
(0)
is not initialized to a zero
vector but the SCB excitation itself. This results in the complete update of the states of the ﬁlters
given in Figure 5.2. The ﬁnal update of ﬁlter states is performed on the synthesis part of the AbyS
procedure.
Example 5.6 The gain and the match scores are computed for the SCB target given in Exam
ple 5.5. The resulting scores are plotted against the indices of the SCB codewords in Figure 5.11.
The gain scores are quantized gain values and they are constrained to the limits imposed by the
modiﬁed excitation gain scaling procedure [20]. In this case, the best codebook index is 135 and the
corresponding quantized gain score is −1330.
5.4. BITSTREAMGENERATION 77
100 200 300 400 500
2000
0
2000
A
m
p
l
i
t
u
d
e
Gain Score
100 200 300 400 500
0
5
10
x 10
7
SCB index
A
m
p
l
i
t
u
d
e
Match Score
Figure 5.11: Gain and match scores for SCB search in a voiced segment.
5.4 BITSTREAMGENERATION
The bitstream generation process for the FS1016 encoder consists of packing the quantized LSF
values, the pitch delay, the pitch gain, the stochastic codeword and the stochastic gain indices into
a bitstream array and adding bit error protection to the bitstream. The total 144 bits per frame are
allocated as shown in Table 5.1.
The total number of bits needed to encode the quantized LSF values per frame is 34. Though
LSFs of subframes are used for codebook searches, LSFs are transmitted on only a perframe basis.
This is because the subframe LSFs could be easily recovered from the LSFs of a frame using linear
interpolation. The ACB pitch delay index is encoded with 8 bits for odd subframes and 6 bits for
even subframes. Recall that the ACB pitch delay for odd subframes is absolute and the pitch delay
for even subframes is relative with respect to the previous odd subframe. Only the absolute pitch
delays have 3 bits protected by the FEC since coding the absolute pitch delays is crucial for coding
the delta delays correctly. The ACB gains are encoded using 5 bits per subframe, and therefore,
20 bits are needed to encode the ACBgain values per frame. The ACBgain has its most perceptually
sensitive bit protected. The SCB indices need 9 bits per subframe and the SCB gains need 5 bits
per subframe. Therefore, 56 bits are used per frame for the SCB.
One bit per frame is used for synchronization, 4 bits per frame for Forward Error Correction
(FEC) and 1 bit is used for future expansion. The FEC consists of encoding selected bits using a
Hamming code. The Hamming code used is a (15,11) code which means that 11 bits are protected
using 4 bits to give a total of 15 bits. Hamming codes can detect 2 errors and correct 1 error [77].
78 5. THECODEBOOKSEARCH
Table 5.1: Bit allocation for a frame.
Subframe
Parameter 1 2 3 4 Frame
ACB index 8 6 8 6 28
ACB gain 5 5 5 5 20
SCB index 9 9 9 9 36
SCB gain 5 5 5 5 20
LSF 1 3
LSF 2 4
LSF 3 4
LSF 4 4
LSF 5 Subframes not 4
LSF 6 applicable here 3
LSF 7 3
LSF 8 3
LSF 9 3
LSF 10 3
Future Expansion 1
Hamming Parity 4
Synchronization 1
Total 144
3 bits of the pitch delay index for the odd subframes and 1 bit of the pitch gain for all the subframes
are protected using the Hamming code. The 11
th
bit protected by the Hamming code is the future
expansion bit.
5.5 SUMMARY
In this chapter, we discussed the structure of the ACB and the SCB and search methods to ﬁnd
an optimal codevector. Further improvements in the codebook structure were made with the use of
algebraic codes for the SCB, and the modiﬁed CELP algorithm came to be known as ACELP. The
reduction in complexity was obtained using backward ﬁltering and algebraic codes such as Reed
Muller code and NordstromRobinson code [33]. Most of the modern CELPalgorithms, such as the
G.729 and the SMVare categorized as Conjugate Structured ACELP(CSACELP) [4] algorithms.
They use Conjugate StructureVQ (CSVQ) to perform joint quantization of the adaptive and
the stochastic excitation gains [49]. In AMRWB, multiple bit rates are achieved using algebraic
codebooks of different sizes and encoding the LP parameters at different rates. Furthermore, the
pitch and algebraic codebook gains are jointly quantized in the AMRWB [5].
79
C H A P T E R 6
The FS1016 Decoder
The parameters of the encoded bitstream are decoded in the FS1016 CELP receiver. The decoded
parameters include the ACB and the SCB gains and indices for each subframe. Also included in
the decoded parameters are the LSFs of a frame, which are then interpolated to obtain subframe
LSFs. The SCBgain, the ACBgain and the pitch delays are smoothed in case transmission errors are
detected. Composite excitation with both the periodic and stochastic components is generated for
the LP synthesis ﬁlter. The rawsynthesized speech is scaled and clamped to the 16bit integer range,
and the distance measures between input and synthesized speech are computed. The synthesized
speech is postﬁltered to enhance the formant structure of speech, and it is also ﬁltered using a
highpass ﬁlter. Therefore, there are three speech outputs from the CELP receiver: raw synthesized
(nonpostﬁltered), postﬁltered and highpass ﬁltered speech.
The block diagram of the CELP receiver is illustrated in Figure 6.1 and the key functions
will be described in the following sections. Demonstration of the functions will be provided using
MATLAB programs and plots.
Extract and decode
CELP synthesis
parameters
Generate composite
excitation
Transmitted
Bitsream
Interpolate LSFs for
subframes
celpsyn.m intsynth.m pitchvq.m
LP synthesis
polefilt.m
Postfilter
output
speech
postfilt.m
Distortion
measures
celpsyn.m
Highpass
filter output
speech
Compute
distance
measures
disto.m
Highpass
filtered
speech
Postfiltered
speech
Non
postfiltered
speech
Extract and decode
CELP synthesis
parameters
Generate composite
excitation
Transmitted
Bitsream
Interpolate LSFs for
subframes
celpsyn.m intsynth.m pitchvq.m
LP synthesis
polefilt.m
Postfilter
output
speech
postfilt.m
Distortion
measures
celpsyn.m
Highpass
filter output
speech
Compute
distance
measures
disto.m
Highpass
filtered
speech
Postfiltered
speech
Non
postfiltered
speech
Figure 6.1: The CELP receiver.
6.1 EXTRACTINGPARAMETERS FORSYNTHESIS
The CELP synthesis parameters of a particular frame are extracted and decoded from the bitstream
as the ﬁrst step in the CELP synthesis process. This step also involves decoding the LSFs using the
80 6. THEFS1016 DECODER
quantization table and interpolating them to create subframe LSPs. Smoothing of the ACB, SCB
gains and the pitch delays is also performed as a part of this process.
The codewords are extracted from the received bitstream and the Hamming error protected
bitstream is decoded. The bit error rate is computed as a running average of bad syndromes. In
Hamming code, a syndrome other than zero indicates the position of error and hence is called
a bad syndrome whereas zero indicates no error [77]. The LSFs of the particular frame are ex
tracted, decoded and interpolated to create subframe LSFs. The LSFs are decoded by indexing the
quantization table using the transmitted indices to create quantized LSFs.
The quantizedvalues of LSFs are thenusedtocreate subframe LSFs using linear interpolation
as shown in Program P6.1. The quantized LSFs are tested for monotonicity again in the synthesis
stage. Corrections are done similar to the analysis stage to make the LSFs monotonic if they are
nonmonotonic. Subframe LSFs are computed by interpolating the LSFs of the current and the
previous frames.
% P6.1  synthLSP.m
% nn  Number of subframes per frame
% lspnew  New frequency array (LSPs)
% no  Number of LSPs
% lsp  Interpolated frequency matrix
load synLSP_data;
% CHECK LSPs; TRY TO FIX NONMONOTONIC LSPs BY REPEATING PAIR
if any( lspnew(2:no) <= lspnew(1:no1) )
for i = 2:no
if lspnew(i) <= lspnew(i1)
lspnew(i) = lspoldIS(i);
lspnew(i1) = lspoldIS(i1);
end
end
end
% RECHECK FIXED LSPs
if any( lspnew(2:no) <= lspnew(1:no1) )
% REPEAT ENTIRE LSP VECTOR IF NONMONOTONICITY HAS PERSISTED
lspnew = lspoldIS;
end
% INTERPOLATE LSPs AND TEST FOR MONOTONICITY
for i = 1:nn
lsp( i, 1:no ) = ( ( wIS(1,i) * lspoldIS( 1:no ) ) + ...
( wIS(2,i) * lspnew( 1:no ) ) )';
% CHECK FOR MONOTONICALLY INCREASING LSPs
if any( lsp(i,2:no) <= lsp(i,1:no1) )
fprintf( 'intsynth: Nonmonotonic LSPs @ the current frame \n');
end
end
Program P6.1: Interpolating LSFs for subframes.
6.1. EXTRACTINGPARAMETERS FORSYNTHESIS 81
The weights used for the interpolation are given in Table 6.1. Indicating the LSF of the
previous frame as f
o
and the LSF of the current frame as f
n
, the LSFs of the i
th
subframe are
computed as follows,
f
i,s
= w
i,o
f
o
+ w
i,n
f
n
, (6.1)
w
i,o
is the weight corresponding to the i
th
subframe LSF that multiplies the previous frame LSF
and w
i,n
is the weight corresponding to the i
th
subframe LSF that multiplies the current frame
LSF.
Table 6.1: Interpolation weights for subframe LSFs.
Subframe Number
1 2 3 4
Previous Frame 0.8750 0.6250 0.3750 0.1250
Current Frame 0.1250 0.3750 0.6250 0.8750
The ACB and the SCB parameters are decoded from the bitstream. Smoothing is performed
on the SCB gain, the ACB gain and the pitch delay if bit protection is enabled. Smoothing is
essentially the process of reducing the variations in the received parameters across subframes. Here,
we will illustrate the case of gain smoothing and the smoothing of pitch delay is also done in a
similar manner. Gain smoothing is performed only when errors are detected. This is because, when
the actual gain is not correct, the smoothed gain from the past and future subframes will be a
good estimate. If either two errors are detected or the bit error rate is more than permissible, gain
smoothing is initiated. This is done by creating a vector of past and future gain values. The mean
and variance of the gain values in the vector are estimated. If the variance is within the prescribed
limit and the gain value of the current subframe falls outside the permissible range, the gain of the
current subframe is set to the average gain value, retaining the actual sign. For the fourth subframe
in any frame, gain smoothing is disabled.
Example 6.1 An illustration of generating subframe LSFs fromthe LSFs of a frame is described in
this example. The subframe LSFs are obtained fromthe LSFs of the previous frame and the current
frame using Program P6.1. Table 6.2 shows the values of LSFs for the current frame, previous frame
and the four subframes in the subframe analysis buffer. The polezero plots of the LP synthesis
ﬁlters corresponding to the LSFs of the previous frame, the LSFs of the current frame, and the
interpolated LSFs of the subframes are given in Figure 6.2. The effect of interpolating the LSFs of
the previous and the current frames is clearly visible in the ﬁgure. The polezero plot corresponding
to the LSF of the ﬁrst subframe is closest to that of the previous frame whereas the polezero plot
corresponding to the LSF of the fourth subframe is closest to that of the current frame.
82 6. THEFS1016 DECODER
1 0.5 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
oldframe
1 0.5 0 0.5 1
1
05
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
newframe
1 05 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
subframe1
1 05 0 0.5 1
1
0 5
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe2
1 05 0 0.5 1
1
05
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe3
1 0.5 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe4
1 0.5 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
oldframe
1 0.5 0 0.5 1
1
05
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
newframe
1 05 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplot fromLSFs of
subframe1
1 05 0 0.5 1
1
0 5
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe2
1 05 0 0.5 1
1
05
0
05
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe3
1 0.5 0 05 1
1
0.5
0
0.5
1
10
Real Part
Ì
m
a
g
i
n
a
r
y
P
a
r
t
PZplots fromLSFs of
subframe4
Figure 6.2: Polezero plots from LSFs (a) for the previous frame, (b) for the current frame, (c) for the
subframe 1, (d) for the subframe 2, (e) for the subframe 3 and (f ) for the subframe 4.
6.2 LP SYNTHESIS
The SCB excitation is generated by scaling the selected SCB codeword by the SCB gain value. The
composite excitation is generated by passing the SCBexcitation through the LTP and ProgramP5.5
is used to performthis. This excitation will be used in the LP synthesis ﬁlter to synthesize the speech
signal. The LP synthesis ﬁlter is an allpole ﬁlter and the denominator polynomial is computed
by converting the subframe LSFs to the coefﬁcients of the LP polynomial A(z) using (3.3). Pro
gram P4.1 is used to convert the subframe LSFs to the corresponding LP coefﬁcients. The LP
coefﬁcients and the composite excitation are used in the allpole synthesis ﬁlter to synthesize the
speech signal.
The synthesized speech is scaled and clamped to the 16bit range. Program P6.2 generates
the raw synthesized (nonpostﬁltered) speech from the LSFs of a subframe and the excitation. It
internally uses the Programs P5.5 and P4.1 in order to generate the composite excitation and convert
the LSFs to directform LP coefﬁcients.
6.2. LP SYNTHESIS 83
Table 6.2: Interpolating LSFs of a frame to create sub
frame LSFs.
Frame LSF Subframe LSF
Old New 1 2 3 4
0.0213 0.0350 0.0230 0.0264 0.0298 0.0333
0.0600 0.0700 0.0612 0.0638 0.0663 0.0688
0.1187 0.0969 0.1160 0.1105 0.1051 0.0996
0.1350 0.1213 0.1333 0.1298 0.1264 0.1230
0.1988 0.1412 0.1916 0.1772 0.1628 0.1484
0.2500 0.2288 0.2473 0.2420 0.2367 0.2314
0.3100 0.3100 0.3100 0.3100 0.3100 0.3100
0.3312 0.3156 0.3293 0.3254 0.3215 0.3176
0.4288 0.4288 0.4288 0.4288 0.4288 0.4288
0.4363 0.4363 0.4363 0.4363 0.4363 0.4363
Example 6.2 The composite excitation is generated and LSFs are converted to LP coefﬁcients
for subframe 3 of Example 6.1. The excitation given in Figure 6.3 and LP coefﬁcients are used to
synthesize speech given in Figure 6.4.
Figure 6.3: Composite excitation with both periodic and stochastic components.
84 6. THEFS1016 DECODER
% P6.2  synth_speech.m
% subframe  Subframe number in the given frame
% scaling  Scaling factor for output speech
% stoch_ex  Stochastic excitation
% comp_ex  Composite excitation with adaptive and stochastic
% components
% lsp  Matrix with LSPs of 4 subframes, each being a single row
% pc  LP coefficients for a particular subframe
% npf  Nonpostfiltered speech subframe
load excit_param;
subframe=3;
scaling=1;
maxv=ones(1,60)*32767;
minv=ones(1,60)*32768;
% GENERATE PITCH EXCITATION VECTOR AND COMBINE WITH STOCHASTIC
% EXCITATION TO PRODUCE COMPOSITE LPC EXCITATION VECTOR
[comp_ex dps] = pitchsynth( stoch_ex, l, dps, idb, bb, 'long' );
% CONVERT THE LSP OF A SUBFRAME TO CORRESPONDING PC
pc = lsptopc( lsp(subframe,:), no );
% SYNTHESIS OF NONPOSTFILTERED SPEECH
[ dss,npf ] = polefilt( pc, no, dss, comp_ex, l );
% SCALE AND CLAMP SYNTHESIZED SPEECH TO 16BIT INTEGER RANGE
npf = npf * descale;
npf = min( [ npf'; maxv ] )';
npf = round( max( [ npf'; minv ] )' );
Program P6.2: Synthesizing speech from excitation and LSF.
6.3 POSTFILTERINGOUTPUTSPEECH
Postﬁltering of output speech is generally performed in order to improve the perceptual quality
of the output speech [47]. Noise levels are kept as low as possible in the formant regions during
encoding as it is quite difﬁcult to force noise below the masking threshold at all frequencies. The
major purpose for postﬁltering is to attenuate the noise components in the spectral valleys, which
was not attenuated during encoding.
Postﬁlters can be generally designed by moving the poles of the allpole LP synthesis ﬁlter
thereby resulting in ﬁlters of the form 1/A(z/α). But, using this ﬁlter to achieve sufﬁcient noise
reductionwill result insevere mufﬂing of speech[47].This is because the LPsynthesis ﬁlter has a low
pass spectral tilt for voiced speech. Therefore, the spectral tilt of the postﬁlter must be compensated to
6.3. POSTFILTERINGOUTPUTSPEECH 85
Figure 6.4: Synthesized (nonpostﬁltered) speech using the composite excitation and LP coefﬁcients.
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1
10
5
0
5
10
15
Norma ized Frequency (x
π
rad/samp e)
M
a
g
n
i
t
u
d
e
(
d
B
)
1/A(z/0 5)
1/A(z/0 8)
A(z/0 5)/A(z/0 8)
Figure 6.5: Frequency response of the polezero postﬁlter.
86 6. THEFS1016 DECODER
avoid mufﬂing. Figure 6.5 illustrates the frequency response magnitude of 1/A(z/α) for the case of
α = 0.5 and α = 0.8. A(z) is computed for a voiced frame, and it is instructive to note that α = 0.5
gives the spectral tilt alone and α = 0.8 gives the ﬁlter with both the spectral tilt and the formant
structure. Therefore, the spectral tilt can be removed using the ﬁlter of form A(z/0.5)/A(z/0.8),
which is the polezero postﬁlter where the spectral tilt is partially removed and is also shown in
Figure 6.5.
Further compensation for the lowfrequency tilt and mufﬂing can be provided by estimating
the spectral tilt (ﬁrstorder ﬁt) of the postﬁlter denominator polynomial. This is done by computing
the reﬂection coefﬁcients of the denominator polynomial. The ﬁrst reﬂection coefﬁcient is the LP
coefﬁcient a
1
(1) of the ﬁrstorder polynomial that ﬁts the spectrum. An allzero ﬁlter of the form
1 + μa
1
(1)z
−1
with μ = 0.5 is used for the adaptive compensation of spectral tilt. The response of
this ﬁlter and the overall response of the polezero postﬁlter with adaptive spectral tilt compensation
are given in Figure 6.6.
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1
12
8
4
0
4
8
12
Normalized Frequency (x
π
rad/sample)
M
a
g
n
i
t
u
d
e
(
d
B
)
Compensated PZ postfilter
PZ postfilter
Figure 6.6: Polezero postﬁlter with and without adaptive spectral tilt compensation.
The power of the synthesized speech before postﬁltering (input power) and the after post
ﬁltering (output power) are computed by ﬁltering the squared signal using an exponential ﬁlter.
Exponential ﬁlters are generally used in time series analysis for smoothing the signals and are de
scribed by the transfer function,
H
s
(z) =
τ
1 − (1 − τ)z
−1
, (6.2)
6.3. POSTFILTERINGOUTPUTSPEECH 87
100 200 300 400 500 600 700 800 900 1000
0
0 002
0 004
0 006
0 008
0 01
Samples
A
m
p
l
i
t
u
d
e
Figure 6.7: Impulse response of the exponential smoothing ﬁlter.
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1
50
45
40
35
30
25
20
15
10
5
0
Normalized Freq (x
π
rad/s)
A
m
p
l
i
t
u
d
e
(
d
B
)
Figure 6.8: Magnitude frequency response of the exponential smoothing ﬁlter.
88 6. THEFS1016 DECODER
where τ is the parameter that controls the degree of smoothing; smaller the parameter τ, greater
the degree of smoothing. The smoothed average estimates of the input and the output power of a
subframe are used for Automatic Gain Control (AGC). The impulse and frequency responses of the
exponential ﬁlter with τ = 0.01 are shown in Figures 6.7 and 6.8, respectively. Note that since the
ﬁlter is IIR, only the truncated impulse response is shown. Because of the properties of the Fourier
transform, the magnitude response of the exponential ﬁlter is also a decaying exponential function.
The purpose of the AGC is to scale the output speech such that its power is roughly the same
as unﬁltered noisy output speech. The scaling factor is computed as,
S =
_
P
unf
P
out
, (6.3)
where P
unf
and P
out
are the unﬁltered speech power and output speech power, respectively. The
output speech is then scaled using the scaling factor. The MATLAB code for adaptive postﬁltering
and AGC is given in Program P6.3.
The two different values of α used in the polezero postﬁlter are stored in the variables alpha
and beta in Program P6.3. The ﬁlter memories of the pole and zero ﬁlters are also taken into
consideration during postﬁltering. Allzero ﬁltering is also performed to provide further compen
sation to spectral tilt. The negative of the ﬁrst reﬂection coefﬁcient is taken as a
1
(1) because of the
sign convention used in the program. The input and output power estimates are computed using
exponential ﬁlters taking into account the memory of the ﬁlters as well.
Example 6.3 Postﬁltering is illustrated for the subframe 3 of the speech segment given in Exam
ple 6.1. Figure 6.9 shows the plot of the postﬁltered and scaled output speech subframe. Figure 6.10
shows the plot of scaling factors from the AGC module for the speech subframe. For this particular
subframe, the scaling factors are less than 1 and the postﬁltered speech has to be scaled down.
6.4 COMPUTINGDISTANCEMEASURES
The distance measures are computed between the actual input speech in the subframe buffer and the
synthesized speech segment (nonpostﬁltered). The procedure for computing the distance measures
is exactly same as the one described in Chapter 4, but the autocorrelation lags of the input and
synthesized speech segments need to be computed before calculating the distance measures.
6.5 SUMMARY
The FS1016 CELP decoder synthesizes output speech from the transmitted parameters. The
LSFs of the frame are decoded and interpolated to form subframe LSFs and the ACB and SCB
parameters extracted from the bitstream. The extracted parameters are then used to generate a
6.5. SUMMARY 89
% P6.3  adaptive_postfilt.m
% s  Nonpostfiltered and postfiltered speech
% alpha, beta  Filter parameters (0.8,0.5)
% powerin, powerout  Input and Output power estimates
% dp1, dp2, dp3  Filter memories
% fci  LPC predictor poles
% no  Predictor order
% agcScale  Automatic gain control scaling vector
% TC  Input/Output power estimate time constant
load postfilt_dat;
% ALLOCATE LOCAL FILTER MEMORY (SPECTRAL TILT COMPENSATOR)
ast = zeros( 2, 1 );
% ESTIMATE INPUT POWER
[ newpowerin, ipZ ] = filter( [TC 0], [1 1+TC], ( s .* s ), ipZ );
powerin = newpowerin(l);
% BANDWIDTH EXPAND PREDICTOR POLES TO FORM PERCEPTUAL POSTFILTER
pcexp1 = bwexp( beta, fci, no );
pcexp2 = bwexp( alpha, fci, no );
% APPLY POLEZERO POSTFILTER
[ dp1, s ] = zerofilt( pcexp1, no, dp1, s, l );
[ dp2, s ] = polefilt( pcexp2, no, dp2, s, l );
% ESTIMATE SPECTRAL TILT (1ST ORDER FIT) OF POSTFILTER
% DENOMINATOR DOMINATES (POLES)
rcexp2 = pctorc( pcexp2, no );
% ADD TILT COMPENSATION BY A SCALED ZERO (DON'T ALLOW HF ROLLOFF)
ast(1) = 1.0;
if rcexp2(1) > 0.0
ast(2) = 0.5 * rcexp2(1);
else
ast(2) = 0.00;
end
[ dp3, s ] = zerofilt( ast, 1, dp3, s, l );
% ESTIMATE OUTPUT POWER
[ newpowerout, opZ ] = filter( [TC 0], [1 1+TC], ( s .* s ), opZ )
powerout = newpowerout(l);
Program P6.3: Postﬁltering output speech. (Continues.)
90 6. THEFS1016 DECODER
% INCORPORATE SAMPLEBYSAMPLE AUTOMATIC GAIN CONTROL
agcUnity = find( newpowerout <= 0 );
newpowerout( agcUnity ) = ones( length( agcUnity ), 1 );
agcScale = sqrt( newpowerin ./ newpowerout );
agcScale( agcUnity ) = ones( length( agcUnity ), 1 );
s = s .* agcScale;
Program P6.3: (Continued.) Postﬁltering output speech.
5 10 15 20 25 30 35 40 45 50 55 60
8000
4000
0
4000
8000
12000
Samples
A
m
p
l
i
t
u
d
e
Figure 6.9: Postﬁltered and scaled output speech segment.
composite excitation which is used along with the LP parameters to synthesize speech on a sub
frame basis. Since CELP uses an AbyS procedure, the CELP encoder contains a replica of the
CELP decoder except for the postﬁlter. The FS1016 speech coder provides a good compromise
between speech quality and bit rate, and the subjective quality of speech is maintained in the presence
of moderate background noise [20]. Modern day speech coders have incorporated variable bit rate
coding schemes to optimize speech quality under different channel noise conditions [5, 42]. The
Selectable Mode Vocoder (SMV) classiﬁes input speech into different categories and selects the best
rate in order to ensure speech quality at a given operating mode [78].
6.5. SUMMARY 91
5 10 15 20 25 30 35 40 45 50 55 60
0 89
0 9
0 91
0 92
0 93
0 94
0 95
Samples
S
c
a
l
i
n
g
f
a
c
t
o
r
Figure 6.10: Scaling factor for postﬁltered speech provided by AGC.
93
Bibliography
[1] M.R. Schroeder and B. Atal, “CodeExcited Linear Prediction (CELP): High Quality Speech
at Very Low Bit Rates,” Proc. ICASSP85, p. 937, Apr. 1985. 7, 9, 63
[2] S. Singhal and B. Atal, “Improving the Performance of MultiPulse Coders at LowBit Rates,”
Proc. ICASSP84, p. 1.3.1, 1984. 7, 63
[3] B.S. Atal and J. Remde, “A New Model for LPC Excitation for Producing Natural Sounding
Speech at Low Bit Rates,” Proc. IEEE ICASSP82, pp. 614–617, Apr. 1982. 7
[4] R. Salami et al., “Design and Description of CSACELP: A Toll Quality 8 kb/s Speech
Coder,” IEEE Trans. on Speech and Audio Proc., vol. 6, no. 2, pp. 116–130, Mar. 1998.
DOI: 10.1109/89.661471 7, 10, 78
[5] B. Bessette et al., “The Adaptive MultiRate Wideband Speech Codec (AMRWB),”
IEEE Trans. on Speech and Audio Proc., vol. 10, no. 8, pp. 620–636, Nov. 2002.
DOI: 10.1109/TSA.2002.804299 7, 11, 49, 78, 90
[6] M.R. Schroeder, B.S. Atal, and J.L. Hall, “Optimising Digital Speech Coders by Ex
ploiting Masking Properties of the Human Ear,” J. Acoust. Soc. Am., vol. 66, 1979.
DOI: 10.1121/1.383662 7
[7] P. Kroon, E. Deprettere, and R.J. Sluyeter, “RegularPulse ExcitationA Novel Approach to
Effective and Efﬁcient Multipulse Coding of Speech,” IEEETrans. on Acoustics, Speech, and
Signal Proc., vol. 34, no. 5, pp. 1054–1063, Oct. 1986. DOI: 10.1109/TASSP.1986.1164946
7
[8] I. Boyd and C. Southcott, “A Speech Codec for the Skyphone Service,” Br. TelecomTechnical
J., vol. 6(2), pp. 51–55, Apr. 1988. DOI: 10.1007/s1055000700700 7
[9] GSM 06.10, “GSM FullRate Transcoding,” Technical Report Version 3.2, ETSI/GSM,
Jul. 1989. 7
[10] B.S. Atal, “Predictive Coding of Speech at Low Bit Rates,” IEEE Trans. on Comm., vol. 30,
no. 4, p. 600, Apr. 1982. 7
[11] B. Atal and M.R. Schroeder, “Stochastic Coding of Speech Signals At Very Low Bit Rates,”
Proc. Int. Conf. Comm., pp. 1610–1613, May 1984. 7
94 BIBLIOGRAPHY
[12] G. Davidson and A. Gersho, “Complexity Reduction Methods for Vector Excitation Coding,”
Proc. IEEE ICASSP86, p. 3055, 1986. 9
[13] W.B. Kleijn et al., “Fast Methods for the CELP Speech Coding Algorithm,” IEEETrans. on
Acoustics, Speech, and Signal Proc., vol. 38, no. 8, pp. 1330, Aug. 1990. DOI: 10.1109/29.57568
9
[14] I. Trancoso and B. Atal, “Efﬁcient Search Procedures for Selecting the Optimum Innovation
inStochastic Coders,” IEEETrans. ASSP38(3), p. 385, Mar. 1990. DOI: 10.1109/29.106858
9
[15] I. Gerson and M.A. Jasiuk, “Vector Sum Excited Linear Prediction (VSELP) Speech
Coding at 8 kb/s,” Proc. IEEE ICASSP90, vol. 1, pp. 461–464, Apr. 1990.
DOI: 10.1109/ICASSP.1990.115749 9, 10
[16] I. Gerson and M. Jasiuk, “Techniques for Improving the Performance of CELPtype speech
coders,” Proc. IEEE ICASSP91, pp. 205–208, May 1991. DOI: 10.1109/49.138990 9
[17] P. KroonandB. Atal, “PitchPredictors withHighTemporal Resolution,” Proc. IEEEICASSP
90, pp. 661–664, Apr. 1990. DOI: 10.1109/ICASSP.1990.115832 9, 71
[18] W.B. Kleijn, “SourceDependent Channel Coding and its Application to CELP,” Advances
in Speech Coding, Eds. B. Atal, V. Cuperman, and A. Gersho, pp. 257–266, Kluwer Ac. Publ.,
1990. 9, 12, 65
[19] A. Spanias, M. Deisher, P. Loizou and G. Lim, “FixedPoint Implementation of the VSELP
algorithm,” ASUTRCTechnical Report, TRCSPASP9201, Jul. 1992. 9
[20] J.P. Campbell Jr., T.E. Tremain and V.C. Welch, “The Federal Standard 1016 4800 bps CELP
Voice Coder,” Digital Signal Processing, Academic Press, Vol. 1, No. 3, p. 145–155, 1991. 10,
64, 65, 66, 76, 90
[21] Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by
4800 bit/second Code Excited Linear Prediction (CELP), National Communication System 
Ofﬁce Technology and Standards, Feb. 1991. 10, 12
[22] TIA/EIAPN 2398 (IS54), “The 8 kbit/s VSELP Algorithm,” 1989. 10
[23] GSM 06.60, “GSM Digital Cellular Communication Standards: Enhanced FullRate
Transcoding,” ETSI/GSM, 1996. 10
[24] GSM 06.20, “GSM Digital Cellular Communication Standards: Half Rate Speech; Half
Rate Speech Transcoding,” ETSI/GSM, 1996. 10
[25] ITU Draft Recommendation G.728, “Coding of Speech at 16 kbit/s using LowDelay Code
Excited Linear Prediction (LDCELP),” 1992. 10
BIBLIOGRAPHY 95
[26] J. Chen, R. Cox, Y. Lin, N. Jayant, and M. Melchner, “A LowDelay CELP Coder for the
CCITT 16 kb/s Speech Coding Standard,” IEEE Trans. on Sel. Areas in Comm., vol. 10,
no. 5, pp. 830–849, Jun. 1992. DOI: 10.1109/49.138988 10
[27] TIA/EIA/IS96, “QCELP,” SpeechService Option3for WidebandSpreadSpectrumDigital
Systems, TIA 1992. 10
[28] ITURecommendation G.723.1, “Dual Rate Speech Coder for Multimedia Communications
transmitting at 5.3 and 6.3 kb/s,” Draft 1995. 10
[29] TIA/EIA/IS641, “Cellular/PCSRadio Interface  Enhanced FullRate SpeechCodec,” TIA
1996. 10
[30] TIA/EIA/IS127, “Enhanced Variable Rate Codec,” Speech Service Option 3 for Wideband
Spread Spectrum Digital Systems, TIA, 1997. 10
[31] W.B. Kleijn et al., “Generalized AnalysisbySynthesis Coding and its Application
to Pitch Prediction,” Proc. IEEE ICASSP92, vol. 1, pp. 337–340, Mar. 1992.
DOI: 10.1109/ICASSP.1992.225903 10
[32] ITU Study Group 15 Draft Recommendation G.729, “Coding of Speech at 8kb/s using
ConjugateStructure AlgebraicCodeExcited LinearPrediction (CSACELP),” 1995. 10
[33] J.P. Adoul et al., “Fast CELP Coding Based on Algebraic Codes,” Proc. IEEE ICASSP87,
vol. 12, pp. 1957–1960, Apr. 1987. 10, 78
[34] J.I. Lee et al., “On Reducing Computational Complexity of Codebook Search in
CELP Coding,” IEEE Trans. on Comm., vol. 38, no. 11, pp. 1935–1937, Nov. 1990.
DOI: 10.1109/26.61473 10
[35] C. Laﬂamme et al., “On Reducing Computational Complexity of Codebook Search in CELP
Coder through the Use of Algebraic Codes,” Proc. IEEE ICASSP90, vol. 1, pp. 177–180,
Apr. 1990. DOI: 10.1109/ICASSP.1990.115567 10
[36] D.N. Knisely, S. Kumar, S. Laha, and S. Navda, “Evolution of Wireless Data Services: IS95 to
CDMA2000,” IEEEComm. Mag., vol. 36, pp. 140–149, Oct. 1998. DOI: 10.1109/35.722150
11
[37] ETSI AMR Qualiﬁcation Phase Documentation, 1998. 11
[38] R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg, “The Adaptive MultiRate speech
coder,” Proc. IEEE Workshop on Speech Coding, pp. 117–119, Jun. 1999. 11
[39] Y. Gao et al., “The SMV Algorithm Selected by TIA and 3GPP2 for CDMA Applications,”
Proc. IEEEICASSP01, vol. 2, pp. 709–712, May 2001. DOI: 10.1109/ICASSP.2001.941013
11
96 BIBLIOGRAPHY
[40] Y. Gao et al., “EXCELP: A Speech Coding Paradigm,” Proc. IEEE ICASSP01, vol. 2,
pp. 689–692, May 2001. DOI: 10.1109/ICASSP.2001.941008 11
[41] TIA/EIA/IS893, “Selectable Mode Vocoder,” Service Option for Wideband Spread Spec
trum Communications Systems, ver. 2.0, Dec. 2001. 11
[42] M. Jelinek and R. Salami, “Wideband Speech Coding Advances in VMRWB Standard,”
IEEETrans. onAudio, Speech and Language Processing, vol. 15, iss. 4, pp. 1167–1179, May 2007.
DOI: 10.1109/TASL.2007.894514 11, 49, 90
[43] R. Salami, R. Lefebvre, A. Lakaniemi, K. Kontola, S. Bruhn and A. Taleb, “Extended AMR
WB for HighQuality Audio on Mobile Devices,” IEEE Transactions on Communications,
Vol. 44, No. 5, May 2006. DOI: 10.1109/MCOM.2006.1637952 12
[44] I. Varga, S. Proust and H. Taddei, “ITUT G.729.1 Scalable Codec for New Wideband
Services,” IEEE Communications Magazine, Oct. 2009. 12
[45] D. Kemp et al., “An Evaluation of 4800 bits/s Voice Coders,” Proc. ICASSP89, p. 200,
Apr. 1989. 12
[46] D. Lin, “New Approaches to Stochastic Coding of Speech Sources at Very Low Bit Rates,”
Proc. EUPISCO86, p. 445, 1986. 12, 65
[47] J. Chen and A. Gersho, “Real Time Vector APC Speech Coding at 4800 bps with Adaptive
Postﬁltering,” Proc. ICASSP87, pp. 2185–2188, 1987. 12, 84
[48] A. Spanias, “Speech Coding: a Tutorial Review,” Proceedings of the IEEE, vol. 82, no. 10,
pp. 1541–1582, Oct. 1994. 13
[49] A. Spanias, T. Painter and V. Atti, Audio Signal Processing and Coding, WileyInterscience,
New Jersey, 2007. 13, 78
[50] S. Haykin, Adaptive Filter Theory, PrenticeHall, New Jersey, 1996. 18, 26, 53
[51] S.L. Marple, Digital Spectral Analysis with Applications, Prentice Hall, New Jersey, 1987. 21
[52] T. Parsons, Voice and Speech Processing, McGrawHill, 1987. 21
[53] P. Kabal, “IllConditioning and Bandwidth Expansion in Linear Prediction of Speech,” Proc.
IEEE ICASSP, vol. 1, pp. I824I827, 2003. 23
[54] P.P. Vaidyanathan, The Theory of Linear Prediction, Morgan & Claypool, 2008.
DOI: 10.2200/S00086ED1V01Y200712SPR003 27
[55] J. Makhoul, “Linear Prediction: ATutorial Review,” Proc. IEEE, Vol. 63, No. 4, pp. 561–580,
Apr. 1975. DOI: 10.1109/PROC.1975.9792 28
BIBLIOGRAPHY 97
[56] J. Markel and A. Gray, Jr., Linear Prediction of Speech, SpringerVerlag, New York, 1976. 28
[57] T.E. Tremain, “The Government Standard Linear Predictive Coding Algorithm: LPC10,”
Speech Technology, pp. 40–49, Apr. 1982. 28
[58] G. Box and G. Jenkins, Time Series Analysis Forecasting and Control, HoldenDay, Inc., San
Francisco, 1970. 28
[59] A. Gray and D. Wong, “The Burg Algorithm for LPC Speech Analysis/Synthesis,” IEEE
Trans. ASSP28, No. 6, pp. 609–615, Dec. 1980. 28
[60] V. Viswanathan and J. Makhoul, “Quantization Properties of Transmission Parameters in
Linear Predictive Systems,” IEEETrans. ASSP23, pp. 309–321, Jun. 1975. 28
[61] N. Sugamura and F. Itakura, “Speech Data Compression by LSP Analysis/Synthesis Tech
nique,” Trans. IEICE, Vol. J64, pp. 599–606, 1981. 29
[62] F.K. Soong and BH Juang, “Line Spectrum Pair (LSP) and Speech Data Compression,”
Proc. IEEE ICASSP, pp. 1.10.1–1.10.4, Mar. 1984. 29
[63] P. Kabal and R. Ramachandran, “The Computation of Line Spectral Frequencies using
Chebyshev Polynomials,” IEEE Trans. ASSP34, No. 6, pp. 1419–1426, Dec. 1986. 49,
52
[64] G. Kang and L. Fransen, “Application of LineSpectrum Pairs to LowBitRate Speech
Encoders,” Proc. IEEE ICASSP, vol. 10, pp. 244–247, 1985. 49
[65] J.S. Collura andT.E. Tremain, “Vector Quantizer Design for the Coding of LSF Parameters,”
Proc. IEEE ICASSP, vol. 2, pp. 29–32, Apr. 1993. 49
[66] N. Farvardin and R. Laroia, “Efﬁcient Encoding of Speech LSP Parameters using the
Discrete Cosine Transformation,” Proc. IEEE ICASSP, vol. 1, pp. 168–171, May 1989.
DOI: 10.1109/ICASSP.1989.266390 49
[67] K.K. Paliwal and B.S. Atal, “Efﬁcient Vector Quantization of LPC Parameters at
24 bits/frame,” IEEETrans. on Speech and Audio Processing, vol. 1, iss. 1, pp. 3–14, Jan. 1993.
DOI: 10.1109/89.221363 49
[68] J. Pan and T.R. Fischer, “Vector Quantization of Speech Line Spectrum Pair Parameters and
Reﬂection Coefﬁcients,” IEEETrans. on Speech and Audio Processing, vol. 6, iss. 2, Mar. 1998.
DOI: 10.1109/89.661470 49
[69] Y. Bistritz and S. Peller, “Immittance Spectral Pairs (ISP) for Speech Encoding,” Proc. IEEE
ICASSP, vol. 2, pp. 9–12, Apr. 1993. DOI: 10.1109/ICASSP.1993.319215 49
98 BIBLIOGRAPHY
[70] A. Gray Jr. and J. Markel, “Distance Measures for Speech Processing,” IEEE Trans. ASSP,
vol. 24, pp. 380–391, Oct. 1976. 57, 60
[71] F. Itakura, “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE
Trans. ASSP, vol. 23, pp. 67–12, Feb. 1975. DOI: 10.1109/TASSP.1975.1162641 60
[72] A.V. Oppenheim, R.W. Schafer and T.C. Stockham, “Nonlinear Filtering of Mul
tiplied and Convolved Signals,” Proc. IEEE, vol. 56, pp. 1264–1291, Aug. 1968.
DOI: 10.1109/PROC.1968.6570 60
[73] A. Gersho and V. Cuperman, “Vector Quantization: A Pattern Matching Technique for
Speech Coding,” IEEE Com. Mag., Vol. 21, p. 15, Dec. 1983. 63
[74] R. Gray, “Vector Quantization,” ASSP Mag., Vol. 1, p. 4, Apr. 1984. 63
[75] M. Sabin and R. Gray, “Product Code Vector Quantizers for Speech Waveform Coding,”
Proc. Globecom82, p. E6.5.1, 1982. 64
[76] Website for the book. Available online at: http://www.morganclaypool.com/page/
fs1016 60, 68, 70, 72, 75
[77] S. Lin and D.J. Costello, Jr., Error Control Coding, Prentice Hall, New Jersey, 2004. 77, 80
[78] S.C. Greer and A. DeJaco, “Standardization of the Selectable Mode Vocoder,” Proc. IEEE
ICASSP, vol. 2, pp. 953–956, May 2001. DOI: 10.1109/ICASSP.2001.941074 90
99
Authors’ Biographies
KARTHIKEYANN. RAMAMURTHY
Karthikeyan N. Ramamurthy completed his MS in Electrical Engineering and is currently a PhD
student in the School of Electrical, Computer, and Energy Engineering at Arizona State Univer
sity. His research interests include DSP, speech processing, sparse representations and compressive
sensing. He has worked extensively with the CELP algorithm and has created MATLAB modules
for the various functions in the algorithm. He also works in signal processing and spectral analysis
for Earth and geology systems and has created several functions in JavaDSP software for the same.
ANDREAS SPANIAS
Andreas Spanias is a Professor in the School of Electrical, Computer, and Energy Engineering at
Arizona State University (ASU). He is also the founder and director of the SenSIP industry con
sortium. His research interests are in the areas of adaptive signal processing, speech processing, and
audio sensing. He and his student team developed the computer simulation software JavaDSP ( J
DSP – ISBN0972498400). He is author of two text books: Audio Processing and Coding by Wiley
and DSP: An Interactive Approach. He served as Associate Editor of the IEEETransactions on Signal
Processing and as General Cochair of IEEE ICASSP99. He also served as the IEEE Signal Pro
cessing VicePresident for Conferences. Andreas Spanias is corecipient of the 2002 IEEE Donald
G. Fink paper prize award and was elected Fellow of the IEEE in 2003. He served as Distinguished
lecturer for the IEEE Signal processing society in 2004.
MATLAB Software for the Code Excited Linear Prediction Algorithm
The Federal Standard–1016
®
Synthesis Lectures on Algorithms and Software in Engineering
Editor
Andreas S. Spanias, Arizona State University
MATLAB® Software for the Code Excited Linear Prediction Algorithm The Federal Standard 1016
Karthikeyan N. Ramamurthy and Andreas S. Spanias 2009
Advances in Modern Blind Signal Separation Algorithms: Theory and Applications
Kostas Kokkinakis and Philipos C. Loizou 2010
OFDM Systems for Wireless Communications
Adarsh B. Narasimhamurthy, Mahesh K. Banavar, and Cihan Tepedelenlio˘ lu g 2010
Algorithms and Software for Predictive Coding of Speech
Atti Venkatraman 2010
Advances in WaveformAgile Sensing for Tracking
Sandeep Prasad Sira, Antonia PapandreouSuppappola, and Darryl Morrell 2008
Despeckle Filtering Algorithms and Software for Ultrasound Imaging
Christos P. Loizou and Constantinos S. Pattichis 2008
Copyright © 2009 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. MATLAB® Software for the Code Excited Linear Prediction Algorithm – The Federal Standard–1016 Karthikeyan N. Ramamurthy and Andreas S. Spanias www.morganclaypool.com
ISBN: 9781608453849 ISBN: 9781608453856
paperback ebook
DOI 10.2200/S00252ED1V01Y201001ASE003
A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN ENGINEERING Lecture #3 Series Editor: Andreas S. Spanias, Arizona State University Series ISSN Synthesis Lectures on Algorithms and Software in Engineering Print 19381727 Electronic 19381735
MATLAB Software for the Code Excited Linear Prediction Algorithm
The Federal Standard–1016
®
Karthikeyan N. Ramamurthy and Andreas S. Spanias
Arizona State University
SYNTHESIS LECTURES ON ALGORITHMS AND SOFTWARE IN ENGINEERING #3
M &C
Morgan
& cLaypool publishers
vocoder. The book begins with a description of the basics of linear prediction followed by an overview of the FS1016 CELP algorithm. In each chapter. Subsequent chapters describe the various modules of the CELP algorithm in detail. Several code examples and plots are provided to highlight some of the key CELP concepts. secure communications . ITU. an overall functional description of CELP modules is provided along with detailed illustrations of their MATLAB® implementation.ABSTRACT This book describes several modules of the Code Excited Linear Prediction (CELP) algorithm. The MATLAB® code within this book can be found at http://www. G.7XX standards. The authors use the Federal Standard1016 CELP MATLAB® software to describe in detail several functions and parameter computations associated with analysisbysynthesis linear prediction. linear prediction.morganclaypool.com/ page/fs1016 KEYWORDS speech coding. CELP.
. . . . . . . .20 Bandwidth Expansion .5 2.3 3. . . . . .3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Testing IllConditioned Cases . . . . . . . . . . . . . . . . . . . 31 Computing the Zeros of the AntiSymmetric Polynomial . . . . . . . . . . . .1 Code Excited Linear Prediction Algorithms 1. 2 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Linear Prediction Parameter Transformations AnalysisbySynthesis Linear Prediction . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . .1 2. . . . . . . . . 7 1. . . . . . . . . . . .2 The Federal Standard1016 CELP 12 1. . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Framing and Windowing the Input Speech . . . . . . . . . .4 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1. . . . . . 40 Quantizing the Line Spectral Frequencies . . . . . . . . . . . . . . . . . . .4 LongTerm Prediction 1. . . . . . . . . . .2 3. . . . . 26 Summary . . . 16 Computation of Autocorrelation Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Vocal Tract Parameter Estimation 1. . . . .vii Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . 41 . . . . . . . . . . . 28 3 Line Spectral Frequency Computation . . . . . . . . . . . . . . 15 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 2. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 3. . . . . . . .2 Excitation. . . . .1. 23 Inverse LevinsonDurbin Recursion . . . . . . . . . . . . . Gain and Pitch Period 1. . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . .2. 13 2 Autocorrelation Analysis and Linear Prediction . . . . . . . . . . . . . . . . .5 Construction of LSP Polynomials . . . . . . . . . .29 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2. . . . . . . .1 Linear Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Computing the Zeros of the Symmetric Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction to Linear Predictive Coding .2 6 9 3 5 6 1. . . . . . . . . . . . . . . . . . . . 18 The LevinsonDurbin Recursion . .2 2. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 4. . . . . 84 Computing Distance Measures . . . . . . . . . . . . 88 Summary . . . . . . . . . . . . . . . . . . .2.1 Target Signal for Adaptive Codebook Search 5. . . . . . . . . . . . . . . . 66 5. . .3. . . . .3 SubMultiple/Fractional Delay Search 66 Stochastic Codebook Search . . . . . . . . . . . . . . . . . . . 60 5 The Codebook Search . . . . . . . . .2.3 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Bitstream Generation . . . . . . . . 57 Summary . . . . . 63 5. . . . . . . . 93 Authors’ Biographies . . . . . . . . . .2 Integer Delay Search 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Computation of Autocorrelation Lags from Reﬂection Coefﬁcients . . . . . . . . . . . . 82 Postﬁltering Output Speech . . . . . . . . . . . . . . . . . . . . . .7 Adjusting Quantization to Preserve Monotonicity . . . . . . . . . . . . . . . . . . . . . . . 78 6 The FS1016 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Extracting Parameters for Synthesis . . . . . . . . . . . . . . . . . . . . . . .1 6. . . . . . . . . . .viii CONTENTS 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 6. . . . . . . . . . . . . . . . . . . . . . . . .3 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Adaptive Codebook Search . . . . . . . . . .2. . . . . . . .53 Calculation of Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Conversion of LSP to DirectForm Coefﬁcients . . . . . . . . . . . 79 LP Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 6. . . . . . . . . . . . . . . .6 3. . . . . . . . 49 4 Spectral Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Biblography . . . . . . . . 99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . 45 Summary . . . . . . . . . . . .1 Target Signal for Stochastic Codebook Search 5. . . .1 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 4. . . . . . . . . . . .2 Overview of Codebook Search Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 68 70 72 5. . . . . . . . . . . . . . . . . .2 Search for Best Codevector 75 5. . . . . . . . . 72 5. . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Students will ﬁnd it easy to understand the CELP functions by using the software and the associated simulations. Federal Standard1016 is an early standardized version of the CELP algorithm and is based on the enhanced version of the originally proposed AT&T Bell Labs CELP coder [1]. who wrote the MATLAB® code for the FS1016 CELP algorithm. From Chapter 2 and on. had a signiﬁcant impact in speech coding applications. We describe the theory and implementation of the algorithm along with several detailed illustrations using MATLAB® functions. Practitioners and algorithm developers will be able to understand the basic concepts and develop several new functions to improve the algorithm or to adapt it for certain new applications. com/page/fs1016 Karthikeyan N. This book presents MATLAB® software that simulates the FS1016 CELP algorithm. The authors acknowledge Ted Painter. it is a great starting point for learning the core concepts associated with CELP. The theory sections in most chapters are selfsufﬁcient in that they explain the necessary concepts required to understand the algorithm. The MATLAB® code within this book can be found at http://www. Ramamurthy and Andreas S.The implementation details and illustrations with MATLAB® functions complement the theoretical underpinnings of CELP. The extensive list of references allows the reader to carry out a more detailed study on speciﬁc aspects of the algorithm and they cover some of the recent advancements in speech coding. proposed in the mid 1980’s for analysisbysynthesis linear predictive coding. We note that although this is an existing algorithm and perhaps the very ﬁrst that has been standardized.Preface The CELP algorithm.morganclaypool. Spanias February 2010 . Furthermore. we describe the various modules of the FS1016 CELP algorithm. the MATLAB® programs and the supporting ﬁles are available for download in the companion website of the book. We hope this book will be useful in understanding the CELP algorithm as well as the more recent speech coding standards that are based on CELP principles. In the ﬁrst chapter. The CELP algorithm still forms the core of many speech coding standards that exist nowadays. we introduce the theory behind speech analysis and linear prediction and also provide an overview of speech coding standards. and DoD for developing the C code for the algorithm.
.
In this chapter. Voiced speech is produced by exciting the vocal tract ﬁlter with quasiperiodic glottal pulses. A simple engineering synthesis model that has been used in several speech processing and coding applications is shown in Figure 1. The . voiced speech is produced by exciting the vocal tract ﬁlter with periodic impulses and unvoiced speech is generated using random pseudowhite noise excitation. The MATLAB program for Federal Standard1016 (FS1016) CELP algorithm is used to illustrate the components of the algorithm. The vocal tract is usually represented by a tenthorder digital allpole ﬁlter. Gain Vocal Tract Filter Synthetic Speech Figure 1. Unvoiced speech is produced by forcing air through a constriction in the vocal tract or by using other mechanisms that do not engage the vocal chords.1. Linear prediction analysis of speech signals is at the core of most narrowband speech coding standards. some of the basics of linear prediction starting from the principles of speech analysis are introduced along with some fundamental concepts of speech analysis. The periodicity of voiced speech is due to the vibrating vocal chords. Theoretical explanation and mathematical details along with relevant examples that reinforce the concepts are also provided in the chapters.1 CHAPTER 1 Introduction to Linear Predictive Coding This book introduces linear predictive coding and describes several modules of the Code Excited Linear Prediction (CELP) algorithm in detail. This sourcesystem model is inspired by the human speech production mechanism. Filter coefﬁcients and excitation parameters are typically determined every 20 ms or less for speech sampled at 8 kHz.1: Engineering model for speech synthesis.1. As shown in Figure 1.
Filter coefﬁcients are programmable and can be made adaptive (timevarying). A directform realization of the digital ﬁlter is shown in Figure 1. y(n) x(n) b(0) + Σ . the output y(n) is given as the sum of the input minus a linear combination of past outputs (feedback term). INTRODUCTION TO LINEAR PREDICTIVE CODING ﬁlter frequency response models the formant structure of speech and captures the resonant modes of the vocal tract.2. is often referred to as Linear Predictive Coding (LPC). The two excitation parameters for the synthesis conﬁguration shown . Bandwidth is conserved by reducing the data rate required to represent the speech signal. z −1 a(1) z −1 z −1 a(2) a(m) Figure 1. The LPC parameterization is a central component of many compression algorithms that are used in cellular telephony for bandwidth compression and enhanced privacy. (1.2: Linear prediction synthesis ﬁlter.1) In the inputoutput difference equation above. 1. y(n) = b(0)x(n) − i=1 a(i)y(n − i) .2 1.The application of linear prediction in speech processing. the signal model corresponds to an autoregressive (AR) timeseries representation. The parameters a(i) and b(i) are the ﬁlter coefﬁcients or ﬁlter taps and they control the frequency response characteristics of the ﬁlter. When the tenthorder allpole ﬁlter representing the vocal tract is excited by white noise. and speciﬁcally in speech coding.The coefﬁcients of the AR model can be determined using linear prediction techniques.1 LINEAR PREDICTIVE CODING M Digital ﬁlters for linear predictive coding applications are characterized by the difference equation. This data rate reduction is achieved by parameterizing speech in terms of the AR or allpole ﬁlter coefﬁcients and a small set of excitation parameters.
. i.. . This is done by forming a prediction error. LINEAR PREDICTIVE CODING 3 in Figure 1... For example. in a tenthorder linear predictor. ⎥ ⎥ ⎦ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ (1. r (0) ⎤⎡ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣ a(1) a(2) a(3) . 1. in this case.2) The prediction parameters a(i) are unknown and are determined by minimizing the MeanSquareError (MSE) E[e2 (n)].1. 1 N N −m−1 r (m) = s (n + m) s (n) . Therefore.4) The integer N is the number of speech samples in the frame (typically N = 160).. The minimization of the MSE yields a set of autocorrelation equations that can be represented in terms of the matrix equation. 10 e(n) = s(n) − i=1 a(i) s(n − i) .. . r (10) ⎤ ⎡ r (0) r (1) r (2) . As the name implies.3) The autocorrelation sequence can be estimated using the equation. In the simplest case. . r (9) r (8) r (7) . ... a(10) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥.e. . more elaborate forms of parameterization exploit further the redundancy in the signal and yield better compression and much improved speech quality.. the linear predictor estimates the current sample of the speech signal using a linear combination of the past samples.. ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ r (1) r (2) r (3) . . In new standardized algorithms. ..1. r (7) . The autocorrelations are computed once per speech frame and the linear prediction coefﬁcients are computed by .1 are the following: (a) the voicing decision (voiced/unvoiced) and (b) the pitch period. s(n − 1). . . an estimate of the current speech sample s(n) is produced using a linear combination of the ten previous samples.1 VOCAL TRACT PARAMETER ESTIMATION Linear prediction is used to estimate the vocal tract parameters. s(n − 2). r (9) r (1) r (0) r (1) . r (8) r (2) r (1) r (0) . .1. 160 samples (20 ms for 8 kHz sampling) of speech can be represented with ten allpole vocal tract parameters and two parameters that specify the excitation signal. . . . s(n − 10).. n=0 (1.. . 160 speech samples can be represented by only twelve parameters which results in a data compression ratio of more than 13 to 1 in terms of the number of parameters that will be encoded and transmitted. (1. . The prediction parameters a(i) are also used to form the allpole digital ﬁlter for speech synthesis.
INTRODUCTION TO LINEAR PREDICTIVE CODING inverting the autocorrelation matrix. i. In the speech processing literature. r (9) r (8) r (7) . a(2). ⎡ ⎤ ⎡ a(1) r (0) r (1) ⎢ a(2) ⎥ ⎢ r (1) r (0) ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ a(3) ⎥ ⎢ r (2) r (1) ⎢ ⎥=⎢ ⎢ . . 1 ≤ i ≤ m − 1 end f f εm = 1 − (am (m))2 εm−1 End The subscript m of the prediction parameter am (i) in the recursive expression (1. r (7) . . ... ⎢ ⎥ ⎢ ⎣ .4 1. ⎥ ⎢ . The coefﬁcients am (m) are called reﬂection coefﬁcients and are represented by the symbol km = am (m). −i (1... . Reﬂection coefﬁcients correspond to lattice ﬁlter structures that have been shown to be electrical equivalents of acoustical models of the vocal tract.. r (10) ⎤ ⎥ ⎥ ⎥ ⎥ ⎥. . initialize: f ε0 = r (0) for m = 1 to 10: r (m) − am (m) = i=1 m−1 am−1 (i) r (m − i) εm−1 f (1. .3) can be inverted efﬁciently using the LevinsonDurbin orderrecursive algorithm given in (1. . ⎥ ⎥ ⎦ (1. Reﬂection coefﬁcients have good quantization properties and have been used in several . . This allpole ﬁlter reproduces speech segments using either random (unvoiced) or periodic (voiced) excitation.... the negated reﬂection coefﬁcients are also known as Partial Correlation (PARCOR) coefﬁcients. . The symbol εm represents the meansquare estimate of the prediction residual during the recursion. a(10) form the transfer function in (1. . ⎦ ⎣ .5) The coefﬁcients a(1). . H (z) = 1− i=1 1 10 .7 ad) is the order of prediction during the recursion while the integer in the parenthesis represents the f coefﬁcient index.7 ad) for i = 1 to m − 1: am (i) = am−1 (i) − am (m) am−1 (m − i) .7 ad). . .6) of the ﬁlter which is used in the speech synthesis system shown in Figure 1.. for m = 0.1...e. .6) a(i)z The matrix in (1. a(10) r (9) r (8) r (2) r (1) r (0) .. r (0) ⎤−1 ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ r (1) r (2) r (3) ..
the excitation is produced by a random number generator. The Simpliﬁed Inverse Filter Tracking (SIFT) algorithm is based on peakpicking the autocorrelation sequence of the prediction residual associated with downsampled speech. This model is parameterized in terms of the gain. the pitch is measured by a closedloop process which accounts for the impact of the pitch on the overall quality of the reconstructed speech.3. Many of the early algorithms for linear prediction used the twostate excitation model (impulses/noise) shown in Figure 1.3: Lattice structure for linear prediction and reﬂection coefﬁcients. LINEAR PREDICTIVE CODING 5 speech compression systems for encoding vocal tract parameters. voicing can be determined by energy and zerocrossing measurements. Since unvoiced segments are associated with small energy and large number of zero crossings. For lowrate representations of speech.1.1. A lattice structure is shown in Figure 1. Postprocessing algorithms for pitch smoothing are also used to provide frametoframe pitch continuity. the binary voicing parameter and the pitch period.1. . For unvoiced speech. Current pitch detection algorithms yield highresolution (subsample) estimates for the pitch period and are often speciﬁc to the analysissynthesis system. In many AnalysisbySynthesis (AbyS) linear predictive coders. GAIN AND PITCH PERIOD The residual signal e(n) given in (1. 1.1. Some of the wellknown algorithms for pitch detection were developed in the late sixties and seventies.2 EXCITATION. e1f (n) e2f (n) f em (n) Σ s(n) Σ k2 k2 Σ km km k1 k1 z −1 Σ e ( n) b 1 z −1 Σ e ( n) b 2 z −1 Σ b em (n) Figure 1. A more expensive but also more robust pitch detector relies on peakpicking the cepstrum. The gain of voiced and unvoiced segments is generally determined such that the shortterm energy of the synthetic speech segment matches that of the analysis segment. A straightforward approach to pitch detection is based on selecting the peak of the autocorrelation sequence (excluding r(0)).2) forms the optimal excitation for the linear predictor. Pitch estimation and tracking is a difﬁcult problem. the residual signal is typically replaced by a parametric excitation signal model.
1.e.. i.1. For strongly voiced segments.12) 1 − a(τ )z−τ The gain is obtained by the equation a(τ ) = r(τ )/r(0). each polynomial can be represented by the ﬁve frequencies of their zeros (the other ﬁve frequencies are their negatives). such as the Log Area Ratio (LAR). a delay τ and a gain parameter a(τ ).9) P (z) and Q(z) have a set of ﬁve complex conjugate pairs of zeros each that typically lie on the unit circle. ..10) (1. (1. Another representation of Linear Prediction (LP) parameters that is being used in several standardized algorithms consists of Line Spectrum Pairs (LSPs). it represents the ﬁne harmonic structure in the shortterm speech spectrum. 1. 1 + km . As such.6 1.11) (1.The reﬂection coefﬁcients km are byproducts of the Levinson algorithm and are more robust for quantization. Long Term Prediction (LTP) captures the longterm correlation in the speech signal and provides a mechanism for representing the periodicity of speech. Transformations of the reﬂection coefﬁcients. Hence. Quantization of the directform coefﬁcients a(i) is generally avoided. a typical tenthorder polynomial.e. In LSP representations. (1. These frequencies are called Line Spectral Frequencies (LSFs). the ﬁrst few reﬂection coefﬁcients can be encoded with a higher precision. LAR (m) = log A (z) = 1 + a1 z−1 + · · · + a10 z−10 . Estimates of the LTP parameters can be obtained by searching the autocorrelation sequence or by using closedloop searches where the LTP lag that produces the best speech waveform matching is chosen. (1. INTRODUCTION TO LINEAR PREDICTIVE CODING 1.e. i..8) 1 − km have also been used in several LPC algorithms.3 LINEAR PREDICTION PARAMETER TRANSFORMATIONS One of the major issues in LPC is the quantization of the linear prediction parameters. The LTP requires estimation of two parameters. P (z) = A (z) + z−11 A z−1 . 1 Aτ (z) = . is represented by the two auxiliary polynomials P (z) and Q (z) where. The reﬂection coefﬁcients can also be quantized in an ordered manner. If the polynomial A(z) is minimum phase.4 LONGTERM PREDICTION Almost all modern LP algorithms include longterm prediction in addition to the tenthorder shortterm linear predictor. The transfer function of a simple LTP synthesis ﬁlter is given by. the delay is usually an integer that approximates the pitch period. Q (z) = A (z) − z−11 A z−1 . i. the roots of the polynomials P (z) and Q(z) alternate on the unit circle.
The three most common excitation models typically embedded in the excitation generator module (Figure 1. The aforementioned MPELP and RPE schemes achieve highquality speech at medium rates. note that a gain factor g scales the excitation vector x and the excitation samples are ﬁltered by the longterm and shortterm synthesis ﬁlters. The Perceptual Weighting Filter (PWF) W (z) shapes the error such that quantization noise is masked by the highenergy formants. in the formant regions.2. and the vector or Code Excited Linear Prediction (CELP) [1]. quantization noise is partially masked by speech. Typically. 3]. (1. and γ2 varies between 0. and m is the order of the linear predictor. In the mideighties. The term hybrid is used because of the fact that AbyS integrates vocoder and waveform coder principles.4. The system consists of a shortterm LP synthesis ﬁlter 1/A(z) and a LTP synthesis ﬁlter 1/AL (z) as shown in Figure 1.94 to 0. 3]. 11] proposed a CELP algorithm for AbyS linear predictive coding. The optimization process determines an excitation sequence that minimizes the perceptuallyweighted MSE between the input speech and reconstructed speech [1. This deemphasis strategy is based on the fact that. 5].7 depending upon the tilt or the ﬂatness characteristics associated with the LPC spectral envelope [4.98.13) 1− i=1 where γ1 and γ2 are the adaptive weights. 2. γ2 (i) a(i) z −i 0 < γ2 < γ1 < 1 . the Regular Pulse Excitation (RPE) [7]. The PWF is given by. and hence. the AbyS LP coders are also called hybrid LP coders. From Figure 1.4. Atal [10] suggested that highquality speech at low rates may be produced by using noninstantaneous (delayed decision) coding of Gaussian excitation sequences in conjunction with AbyS linear prediction and perceptual weighting. .2 ANALYSISBYSYNTHESIS LINEAR PREDICTION In closedloop sourcesystem coders. m 1− W (z) = A(z/γ1 ) = A(z/γ2 ) i=1 m γ1 (i) a(i) z−i . A 9.6 kb/s MultiPulse Excited Linear Prediction (MPELP) algorithm is used in Skyphone airline applications [8]. The standard was eventually replaced by the GSM Enhanced FullRate (EFR) described brieﬂy later. A 13 kb/s coding scheme that uses RPE [7] was adopted for the fullrate ETSI GSM PanEuropean digital cellular standard [9]. shown in Figure 1. the excitation source is determined by closedloop or AbyS optimization.4 and 0.1. ANALYSISBYSYNTHESIS LINEAR PREDICTION 7 1.4) in the AbyS LP schemes include: the MultiPulse Excitation (MPE) [2. Atal and Schroeder [1.4. The role of W (z) is to deemphasize the error energy in the formant regions [6]. γ1 ranges from 0. The closedloop LP combines the spectral modeling properties of vocoders with the waveform matching attributes of waveform coders. For lowrate highquality speech coding a more efﬁcient representation of the excitation sequence is required.
associated with the candidate excitation vector.4: A typical sourcesystem model employed in the analysisbysynthesis LP. s. and T represents the transpose of a vector. sw is the ﬁltered synthetic speech vector ˆ T associated with the i th excitation vector.15). e(i) = sw − s0 − gk s(i) . Minimizing ζ (i) = e(i) e(i) with respect to g (i) . (1. we obtain. ¯ w ¯w ¯ ζ (i) = sT sw − sT sw ¯w ˆ (i)T (i) 2 (i) sw sw ˆ ˆ .14) ˆw where sw is the N × 1 vector that contains the perceptuallyweighted speech samples. ζ (i) can be written as. The excitation codebook search process in CELP can be explained by considering the AbyS scheme shown in Figure 1. g (i) . W (z). g (i) = sT sw ¯w ˆ sw (i)T (i) sw ˆ (i) . INTRODUCTION TO LINEAR PREDICTIVE CODING Input speech s Excitation generator (MPE or RPE) x g 1/A(z) 1/AL(z) ˆ s + _ Σ Residual error Synthetic speech e Error minimization W(z) Figure 1. and the gain. Note that the perceptual weighting. is applied directly on the input speech.15).16) The i th excitation vector x (i) that minimizes (1. ˆw ˆw (1. s. and synthetic speech.15) where sw = sw − s0 . and g (i) is the gain factor. .8 1. are encoded and transmitted along with the shortterm and longterm prediction ﬁlter parameters. in order to facilitate for the CELP analysis that follows.16) is selected and the corresponding gain factor g (i) is obtained from (1. The N × 1 error vector e associated with the i th excitation vector. can be written as. x (i) . From (1. s0 is the (i) vector that contains the output due to the initial ﬁlter state. i.5. (1. ˆ The codebook index.
secondgeneration CELP (19931998).1 CODE EXCITED LINEAR PREDICTION ALGORITHMS In this section.2. A study on the effects of parameter quantization on the performance of CELP was presented in [17].1. In addition.. ﬁrstgeneration CELP (19861992). . …. we taxonomize CELP algorithms into three categories that are consistent with the chronology of their development.. and thirdgeneration CELP (1999present). i. Some of the problems associated with the ﬁxedpoint implementation of CELP algorithms were presented in [19].e.. Davidson and Gersho [12] proposed sparse codebooks and Kleijn et al. In particular. ANALYSISBYSYNTHESIS LINEAR PREDICTION Input speech 9 s Excitation vectors Codebook PWF W(z) sw …. x(i ) g(i) LTP synthesis filter LP synthesis filter ˆ s(i ) Synthetic speech 1/AL(z) 1/A(z) PWF W(z) ˆ s (wi ) + _ Σ Residual error e(i ) MSE minimization Figure 1.2. This problem motivated a great deal of work focused upon developing structured codebooks [12. [13] proposed a fast algorithm for searching stochastic codebooks with overlapping vectors. …. 16] proposed a Vector Sum Excited Linear Predictive (VSELP) coder which is associated with fast codebook search and robustness to channel errors. and the issues associated with the channel coding of the CELP parameters were discussed by Kleijn in [18]. and the operation of the algorithm on ﬁniteprecision and ﬁxedpoint machines.. the effects of channel errors on CELP coders. Gerson and Jasiuk [15. One of the disadvantages of the original CELP algorithm is the large computational complexity required for the codebook search [1].5: A generic block diagram for the AbyS Code Excited Linear Predictive (CELP) coding. Other implementation issues associated with CELP include the quantization of the CELP parameters. 1. 13] and fast search procedures [14].
1 dualrate speech codec [28]. Some of the secondgeneration CELP algorithms include the following: the ITUT G.728 LowDelay (LD) CELP. the interleaving depth. . The ITUT G.2. Various algebraic codebook structures have been proposed [33. The IS54 VSELP algorithm [15. M = 3.723.1. . i is the pulse number. 33. From (1. 1. . Most of these standardized CELP algorithms were eventually replaced by newer second and third generation AbyS coders.2 SecondGeneration NearTollQuality CELP Coders The secondgeneration CELP algorithms are targeted for Internet audio streaming. . and secure communications. The integer M is the number of bits describing the pulse positions. teleconferencing applications.728 LowDelay (LD) CELP coder [25.10 1. The VSELP algorithm uses highly structured codebooks that are tailored for reduced computational complexity and increased robustness to channel errors. 7. but the most popular is the interleaved pulse permutation code. . and d is the interleaving depth. 34]. In this codebook. and the 5. the IS54 VSELP.The FS1016 4.729 Conjugate Structured .3 kb/s Japanese standard [23].e. the number of pulses or tracks are equal to 5. and the ITUT G. Some of the ﬁrstgeneration CELP algorithms include the following: the FS1016 CELP. the GSM EFR [23. 29]. and the number of bits to represent the pulse positions. the IS127 Relaxed CELP (RCELP) [30. . INTRODUCTION TO LINEAR PREDICTIVE CODING 1.8 kb/s CELP standard [20. where. 31].17).1 FirstGeneration CELP Coders The ﬁrstgeneration CELP algorithms operate at bit rates between 4. and the IS96 Qualcomm CELP. the 6. The IS96 Qualcomm CELP (QCELP) [27] is a variable bit rate algorithm and is part of the Code Division Multiple Access (CDMA) standard for cellular communications.1. . . VoiceoverInternetProtocol (VoIP). and short excitation vectors (5 samples). The term Algebraic CELP (ACELP) refers to the structure of the codebooks used to select the excitation codevector. These are generally high complexity and nontoll quality algorithms. pi = i + j d. 21] was jointly developed by the Department of Defense (DoD) and the Bell Labs for possible use in the thirdgeneration Secure Telephone Unit (STUIII).6 kb/s halfrate GSM [24]. i.17) where pi is the pulse position.. 32]. the codevector consists of a set of interleaved permutation codes containing only few nonzero elements. the ITUT G. 35].8 kb/s and 16 kb/s. 3. 4 and j = 0. This is given by. The coding gain improvements in secondgeneration CELP coders can be attributed partly to the use of algebraic codebooks in excitation coding [4. Table 1. 22] and its variants are embedded in three digital cellular standards.2. 26] achieves low oneway delay by using very short frames. 1. j = 0. the 8 kb/s TIA IS54 [22]. 2. . (1. d = 5. 32. a backwardadaptive predictor. pi = i + j 5. 2M − 1 .1 shows an example ACELP codebook structure. 2. 1. where i = 0.Algebraic CELP (CSACELP) [4. 1.
8.7.2. 38] has been adopted by European Telecommunications Standards Institute (ETSI) for GSM telephony. n = 0. 31. 37 3 P3 : 3. . 5. 23. 25. indoor. The bit rate is adjusted according to the trafﬁc conditions. the codevector. 9.75 kb/s. 41]. 28.17) is known as ‘track. 14. 38].05. This is an ACELP algorithm that operates at multiple rates: 12.2. and 5. 29. Track(i) Pulse positions (Pi ) 0 P0 : 0. This is consistent with the vision on wideband wireless standards [36] that will operate in different modes: lowmobility. the codebook vector. highmobility. 30. 5. 1. 23. the set deﬁned by (1. The VMRWB has been adopted as the new 3GPP2 standard for wideband speech telephony. multiplied with their signs (±1). 6. ANALYSISBYSYNTHESIS LINEAR PREDICTION 11 Table 1. In Europe. 20. x(n). 12. the adaptive GSM multirate coder [37.1. 18. VMRWB is a Variable Bit Rate (VBR) coder with the source controlling the bit rate of operation: FullRate (FR). . 36 2 P2 : 2. In particular. etc. 34. 13. Note that the algebraic codebooks do not require any storage. 21. 16.25. The other two important speech coding standards are the Variable Rate Multimode WideBand (VMRWB) and the extended AMRWB (AMRWB+). and it operates at rates 23. and in the U. There are at least two algorithms that have been developed and standardized for these applications. . GSM standardized the Adaptive MultiRate (AMR) coder [37.15. HalfRate (HR).6 kbps. 24. 27.18) where δ(n) is the unit impulse. QuarterRate (QR) or EighthRate . The pulse position indices and the signs are encoded and transmitted. 8.1: An example algebraic codebook structure. 33. 39 .85 and 6. 32. 19. tracks and pulse positions. the Telecommunications Industry Association (TIA) has tested the Selectable Mode Vocoder (SMV) [39. The higher rates provide better quality where background noise is stronger.25. The Adaptive MultiRate WideBand (AMRWB) [5] is an ITUT wideband standard that has been jointly standardized with 3GPP. streaming and multimedia messaging systems [42].’ and the value of j deﬁnes the pulse position. 22. 5. From the codebook structure shown in Table 1. 7. 18. 15. x(n). 39 For a given value of i. 14. is computed by placing the 5 unit pulses at the determined locations.85.2.1.2. 4 x(n) = i=0 αi δ(n − pi ) . 17. 19. 10. 26.85. 1. pi .9. 15. 11.65. is given by.S. and pi are the pulse positions.85.3 ThirdGeneration (3G) CELP for 3G Cellular Standards The 3G CELP algorithms are multimodal and accommodate several different bit rates. 7.95. In particular. 12. . (1..1. 10. αi are the pulse amplitudes (±1). 38 4 P4 : 4. 40. 35 1 P1 : 1. 6.
7 and 1 kbps for FR. The computational complexity of the FS1016 CELP was .2. Each frame is segmented into subframes of 7.5 kb/s. Speech in the FS1016 CELP is sampled at 8 kHz and segmented in frames of 30 ms duration.8 kb/s.2.6. Each codevector consists of sixty samples and each sample is ternary valued (1. −1) [46] to allow for fast convolution. Although new algorithms for use with the STU emerged. 6. The excitation in this CELP is formed by combining vectors from an adaptive and a stochastic codebook with gains ga and gs . The codec can operate from 8 kb/s to 32 kb/s with 8 kb/s and 12 kb/s in the narrowband and 14 kb/s to 32 kb/s at 2 kb/s intervals in the wideband. The rate and mode selections in SMV are based on the frame voicing characteristics and the network conditions. such as the Mixed Excitation Linear Predictor (MELP). the CELP FS1016 remains interesting for our study as it contains core elements of AbyS algorithms that are still very useful. The term “adaptive codebook” is used because the LTP lag search can be viewed as an adaptive codebook search where the codebook is deﬁned by previous excitation sequences (LTP state) and the lag τ determines the speciﬁc vector. The details on the bit allocations are given in the standard. The codec incorporates bandwidth extension and provides bandwidth and bit rate scalability at the same time. The stochastic codebook contains 512 sparse and overlapping codevectors [18]. 2. The codebooks are searched sequentially starting with the adaptive codebook. The candidate algorithms and the selection process for the standard are described in [45].2 THE FEDERAL STANDARD1016 CELP A 4. The adaptive codebook contains the history of past excitation signals and the LTP lag search is carried over 128 integer (20 to 147) and 128 noninteger delays. halfrate at 4 kb/s. quarterrate at 2 kb/s. The SMV is based on four codecs: fullrate at 8. The synthesis conﬁguration for the FS1016 CELP is shown in Figure 1. QR and ER encoding schemes.729.729 codec [44]. It has been selected as an audio coding standard in 2004 by ETSI and 3rd Generation Partnership Project (3GPP). A subset of lags is searched in even subframes to reduce the computational complexity. INTRODUCTION TO LINEAR PREDICTIVE CODING (ER) encoding to provide the best subjective quality at a particular Average Bit Rate (ABR). Subframe LSPs are obtained by applying linear interpolation of frame LSPs.12 1. 1. and capacity over the existing IS96 QCELP and IS127 Enhanced Variable Rate Coding (EVRC) CDMA algorithms. The excitation vectors are selected in every subframe by minimizing the perceptuallyweighted error measure. respectively (gainshape vector quantization). Ten shortterm prediction parameters are encoded as LSPs on a framebyframe basis. HR. ﬂexibility.5 ms duration.3. and eighthrate at 0. The AMRWB+ [43] can address mixed speech and audio content and can consistently deliver high quality audio even at low bit rates.8 kb/s CELP algorithm has been adopted in the late 1980s by the DoD for use in the STUIII.1 is a recent speech codec adopted by ITUT that can handle both narrowband and wideband speech and is compatible with the widely used G. 0. The SMV algorithm (IS893) was developed to provide higher quality. This algorithm is described in the Federal Standard1016 (FS1016) [21] and was jointly developed by the DoD and AT&T Bell labs. The sourcecoding bit rates are 13. G. A shortterm polezero postﬁlter (similar to that proposed in [47]) is also part of the standard.
estimated at 16 Million Instructions per Second (MIPS) for partially searched codebooks and the Diagnostic Rhyme Test (DRT) and Mean Opinion Scores (MOS) were reported to be 91. and the ﬁnal chapter describes the components in the FS1016 decoder module.1. computation of LSFs and their quantization are described in Chapter 3. A more recent and short review of the modern speech coding algorithms is available in the book by Spanias et al. SUMMARY Stochastic codebook VQ index 13 gs + Speech Σ Adaptive codebook Lag index Σ + Postfilter ga A(z) Figure 1. The construction of LSP polynomials. respectively. Chapter 2 will describe the autocorrelation analysis and linear prediction module starting from the framing of speech signal upto the computation of reﬂection coefﬁcients from the bandwidth expanded LP coefﬁcients. From Chapter 2 and on. A detailed review of the basics of speech coding. algorithms and standards until the early 1990’s can be found in [48]. Various distance measures that are used to compute spectral distortion are detailed in Chapter 4. we will describe the details of the FS1016 standard and provide MATLAB program for all the pertinent functions.2.3. [49]. Chapter 5 illustrates the adaptive and stochastic codebook search procedures.6: FS1016 CELP synthesis.5 and 3.3 SUMMARY This chapter introduced the basics of linear predictive coding. 1. . AbyS linear prediction and provided a review of the speech coding algorithms based on CELP.
.
The block diagram in Figure 2. whose name is provided in the block itself. Figure 2. will have a similar schematic block diagram that will illustrate the overall function of the module described in that chapter. Each block in the diagram corresponds to a particular script in the FS1016 MATLAB code. The autocorrelation lags are converted to LP coefﬁcients using LevinsonDurbin recursion.15 CHAPTER 2 Autocorrelation Analysis and Linear Prediction Linear Prediction (LP) analysis is performed on the framed and windowed input speech to compute the directform linear prediction coefﬁcients. It involves performing autocorrelation analysis of speech and using the LevinsonDurbin algorithm to compute the linear prediction coefﬁcients. Every chapter starting from this one.1 illustrates the autocorrelation analysis and linear prediction module of the FS1016 CELP transmitter. The autocorrelation coefﬁcients are calculated and high frequency correction is done to prevent ill conditioning of the autocorrelation matrix. .1: Autocorrelation analysis and linear prediction. Then. The bandwidth expanded LP coefﬁcients are converted back to Reﬂection Coefﬁcients (RCs) using the inverse LevinsonDurbin recursion. the directform LP coefﬁcients are subjected to bandwidth expansion.
Two speech buffers.1. In the later part of the FS1016 analysis stage. ssub . while the subframe analysis buffer.1 loads an input speech frame and creates the three speech frame buffers in the variables snew. Figure 2.1 FRAMING AND WINDOWING THE INPUT SPEECH The speech frame used in the FS1016 transmitter is of size 240 samples and corresponds to 30 ms of speech at the standardized sampling rate of 8000 Hz. It is then ﬁltered using a highpass ﬁlter to remove low frequency noise and windowed with a Hamming window for high frequency correction.4.6. Example 2.5 ms of speech. . conversion of the autocorrelation lags to LP coefﬁcients and recursive conversion of the LP coefﬁcients to RCs using MATLAB programs.2.1 The Hamming window used to window the speech signal is shown in Figure 2. Clamping to 16 bit integer range is performed by thresholding the entries of the speech buffer. AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION The following sections of this chapter demonstrate the computation of autocorrelation lags.16 2. and they are generated using Program P2. 2.The input speech buffer. respectively. snew . is half frame behind snew and half frame ahead of sold . There are three speech buffers used in the encoder. sold and snew are one frame behind and one frame ahead. The Fourier transform of the windowed speech frame is smoother because of the convolution of the frequency response of the Hamming window in the frequency domain. The input speech (snew ) and the windowed speech (s) frames are shown in Figures 2. ssub will be divided into four subframes each of size 60 samples corresponding to 7.3 and 2. is scaled and clamped to 16 bit integer range. before processing. Program P2. sold and ssub. respectively. The plot of the Fourier transforms for the input and the windowed speech frames are provided in Figures 2.2: The 240 point hamming window.5 and 2.
1: Framing and windowing. .1. FRAMING AND WINDOWING THE INPUT SPEECH 17 RUN HIGHPASS FILTER TO ELIMINATE HUM AND LF NOISE Program P2.2.
2 COMPUTATION OF AUTOCORRELATION LAGS The biased autocorrelation estimates are calculated using the modiﬁed time average formula [50].1) . N r(i) = n=1+i u(n)u(n − i). 1. . Figure 2. . .18 2. (2. i = 0.4: Windowed speech frame. AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION Figure 2. M .3: Input speech frame. 2. .
7 is used to estimate autocorrelation using Program P2. COMPUTATION OF AUTOCORRELATION LAGS 19 Figure 2.2.2. Figure 2. i is the lag.2 A sinusoidal test signal (u) shown in Figure 2.5: Fourier transform of input speech frame. Program P2. r is the autocorrelation value that is computed.6: Fourier transform of windowed speech frame. Example 2. and the autocorrelation estimates are computed for lags 0 to 20 as shown in .2 computes the autocorrelation lags of a test signal u for the maximum lag speciﬁed by M. where.2. u is the input signal and N is the length of the input signal.
AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION Program P2. Figure 2.20 2. Recall that LP analysis is carried out using the allzero ﬁlter A(z) whose input is the preprocessed speech frame and the output is the LP residual. which models the vocal tract and reproduces the speech signal when excited by the LP residual. But. 2. which has a peak at a lag of 16. and it can be readily observed from the autocorrelation function.2: Computation of autocorrelation lags.3 THE LEVINSONDURBIN RECURSION Linear predictor coefﬁcients are the coefﬁcients of the LP polynomial A(z). .8. Figure 2. during synthesis the ﬁlter used has the transfer function 1/A(z). The sinusoidal test signal has a period of 16 samples.7: Sinusoidal test signal.
The LevinsonDurbin recursion is used to compute the coefﬁcients of the LP analysis ﬁlter from the autocorrelation estimates [51]. The algorithm [52] initializes the MSE as follows. Then. For i = 1 to m−1.9). the index of linear predictor coefﬁcients start from 1. We ﬁnd the linear predictor coefﬁcients recursively using (2.3) (2. Note that. a1 (1) = (2. in the MATLAB code given in Program P2.8: The autocorrelation estimate for the sinusoidal test signal. −r(1) . THE LEVINSONDURBIN RECURSION 21 Figure 2. for a ﬁrstorder predictor.5) where. r(0) k1 = a1 (1) . whereas in the equations the index starts from 0. .2) where. ε0 is the initial MSE and r(0) is the autocorrelation.3.2.6) to (2. (2. ε0 = r(0) . It exploits the Toeplitz structure of the autocorrelation matrix to recursively compute the coefﬁcients of a tenthorder ﬁlter. q = r(1) .4) (2. am (i) is the i th predictor coefﬁcient of order m and km is the reﬂection coefﬁcient.3.
10) q = r(i + 1) + j =1 −q . .The autocorrelation estimates (r) of the speech signal (s) shown in Figure 2. i (2. εi = εi−1 + ki q . (2.7) (2. εi εi = εi−1 (1 − ki2 ) .9) (2.22 2.3 The frequency response magnitude of the LP synthesis ﬁlter 1/A(z).9.3: LevinsonDurbin recursion. where the coefﬁcients of A(z) are calculated using the Program P2.6) ai (i + 1 − j )r(i) . ai+1 (j ) = ai (j ) + ki+1 ai (i + 1 − j ) . ki+1 = Example 2.4 are used to compute the coefﬁcients of A(z).3. AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION Program P2. is shown in Figure 2.8) (2.
2. a bandwidth expansion factor of 0. Bandwidth expansion expands the bandwidth of the formants and makes the transfer function somewhat less sensitive to roundoff errors. .9: Frequency response of the linear prediction synthesis ﬁlter. At a sampling frequency of 8000 Hz.4 BANDWIDTH EXPANSION Bandwidth expansion by a factor of γ changes the linear predictor coefﬁcients from a(i) to γ i a(i) and moves the poles of the LP synthesis ﬁlter inwards. BANDWIDTH EXPANSION 23 Figure 2.994127 produces a bandwidth expansion of 15 Hz. This bandwidth expansion factor is used in the implementation of the FS1016 algorithm.11) √ where 3 − 2 2 ≤ γ ≤ 1.4: Bandwidth expansion. (2. BW = 2 cos−1 1 − (1 − γ )2 2γ . Program P2. The bandwidth expansion can be computed by [53]. and BW is the bandwidth expansion in radians.2.4.
AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION The polezero plots of the original and bandwidthexpanded ﬁlter coefﬁcients for a tenthorder linear predictor are shown in Figure 2. Example 2.10: Polezero plots of linear prediction ﬁlter.24 2. The linear prediction ﬁlter coefﬁcients are calculated for the speech signal given in Figure 2.1. Though not quite evident in this case.994127 are given in Table 2. The ﬁlter coefﬁcients.4. . the poles have moved inwards in the polezero plot.4 Figure 2. The corresponding frequency response is shown in Figure 2.11. Figure 2.10. before and after bandwidth expansion by a factor of 0. after bandwidth expansion.11: Frequency response before and after bandwidth expansion.
3485 1.5702 1.1: Linear Prediction Filter coefﬁcients.5921 1.2. Figure 2. Coefﬁcient a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 Before Bandwidth Expansion 1.4.0000 1.2588 After Bandwidth Expansion 1. In the polezero plot.12 and the frequency response is given in Figure 2.2870 1.12: PoleZero plots of linear prediction ﬁlter.6015 1. where the ﬁlter expression is given by 1/(1 − 0.3100 1.7386 0. The bandwidth expansion factor γ is taken to be 0.5888 1.7447 0. Please note that this is only an instructive example and bandwidth expansion is generally not applied to the LTP.0558 0. The polezero plot is given in Figure 2.2440 In order to demonstrate bandwidth expansion more clearly. BANDWIDTH EXPANSION 25 Table 2.4591 1.9385 0.9.9z−10 ).9838 0. it is quite evident that the poles have moved towards the origin and the frequency response also clearly shows expanded bandwidth.0000 1.3094 1.13 show that the impulse response after bandwidth expansion decays faster than the one before bandwidth expansion.7789 0. . let us take an example of the Long Term Predictor (LTP).7761 0.4251 1.14.0937 0.The plots given in Figure 2.
14: Frequency response before and after bandwidth expansion.13: Impulse response before and after bandwidth expansion. ai (j ) ai (i − j ) = 1 ki ki 1 ai−1 (j ) ai−1 (i − j ) . j = 0. Here. therefore the orderupdate equations for the forward and backward prediction ﬁlters are given in a matrix format by.12) .5 INVERSE LEVINSONDURBIN RECURSION Inverse LevinsonDurbin recursion is used to compute the RCs from the bandwidth expanded linear predictor coefﬁcients [50]. . . (2.26 2. . the coefﬁcients are realvalued. i . . Figure 2. AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION Figure 2. 2. 1.
using the fact that ki = ai (i). . This is achieved by taking only the ﬁrst four autocorrelation lags (r0_4. This can be intuitively understood from (2. where if ki  < 1. First. (2. 1. Detailed explanation of the relationship between the stability of the LP synthesis ﬁlter and the magnitude of reﬂection coefﬁcients is given in [54]. we take ki = −ai (i) because of the sign convention that forces k1 to be positive for voiced speech. ai−1 (j ) = ai (j ) − ki ai (i − j ) 1 − ki 2 . ki is the reﬂection coefﬁcient of order i. which is not a characteristic of a stable ﬁlter. .5.9). when (2.13) where. r_4) of the speech frame .13). the numerator is a summation instead of difference. INVERSE LEVINSONDURBIN RECURSION 27 where the order i varies from 1 to m. we determine the RCs of orders 1 through m − 1.2.5.5 In order to keep the examples simple. i . Solving for ai−1 (j ) from the above equation. Then. . j = 0. Program P2. . Because of this. we compute the linear predictor coefﬁcients for orders decreasing from m − 1 through 1 using (2. εi < εi−1 which means that the MSE decreases with the order of recursion as we expect for a stable ﬁlter.5: Inverse LevinsonDurbin recursion.12) is implemented in the program. we get. Example 2. we work with a reduced order LP polynomial. then the LP polynomial is minimum phase resulting in a stable LP synthesis ﬁlter. If the absolute values of all the RCs are less than 1. In the MATLAB code given in Program P2. there are certain variable and indexing modiﬁcations that are speciﬁc to our implementation. Starting from the highest order m. and vectorized operations are used to compute the coefﬁcient array in a single statement. If ki  = 1 or ki  > 1 the MSE is either zero or it increases with the order of recursion. The indices for the array start from 1.
Note that the stability of the LP synthesis ﬁlter can be veriﬁed by checking the magnitude of RCs. Because of the good quantization properties. The coefﬁcients of the fourth order LP polynomial are bandwidthexpanded (a_4exp) and provided as input to the Program P2.3523.28 2. −0. The negative of the RCs (−ki ) are also known as PARCOR coefﬁcients [58].5 to yield the reﬂection RCs (k_4). −0. RCs are less sensitive to roundoff noise and quantization errors than the directform LP coefﬁcients [60]. AUTOCORRELATION ANALYSIS AND LINEAR PREDICTION (s) and forming a fourth order LP polynomial using LevinsonDurbin recursion. Further information on linear prediction is available in a tutorial on linear prediction by Makhoul [55] and the book by Markel and Gray [56]. k_4 = [0. a_4exp = [1. 2. Note that the vectors are given in MATLAB notation where separation using semicolon indicates a column vector. however. 0. . In this case. the LP synthesis ﬁlter is stable because the absolute values of all the reﬂection coefﬁcients are less than 1. use Line Spectrum Pairs (LSPs) that are described in the next chapter. Second and third generation coders. −0. The government standard FS1015 algorithm is primarily based on openloop linear prediction and is a precursor of the AbyS FS1016 [57].3206] .7555. 0. 1.6 SUMMARY LP analysis is performed on the speech frames and the directform LP coefﬁcients are obtained using the autocorrelation analysis and linear prediction module. and they can be determined using the harmonic analysis algorithm proposed by Burg [59].7197.The coefﬁcients are given as vectors below. −1. The LP coefﬁcients are bandwidthexpanded and then converted to RCs.5419. RCs have been used in several of the ﬁrst generation speech coding algorithms.0000.2511.7012.3206] .
Correction of illconditioned cases that occur due to nonminimum phase LP polynomials and quantization of LSFs are also illustrated. nonuniform scalar quantization procedure. The block diagram in Figure 3. 0) in the zplane as .29 CHAPTER 3 Line Spectral Frequency Computation Line Spectrum Pairs (LSPs) are an alternative LP spectral representation of speech frames that have been found to be perceptually meaningful in coding systems. The LSFs are quantized using an independent. adjustments are performed in order to restore their monotonicity. When A(z) is minimum phase.2) where P (z) is called the symmetric polynomial (palindromic) and Q(z) is called the antisymmetric polynomial (antipalindromic). Evidently. Construction of LSP polynomials and computation of their zeros [62] by applying Descartes’ rule will be illustrated using MATLAB programs in the following sections. After quantizing the LSP frequencies. computing the LSFs and quantizing them. if the properties of the LSP polynomials are preserved. 3. P (z) has a root at (−1. Two LSP polynomials can be formed from the LP polynomial A(z). Each zero corresponds to a LSP frequency and instead of quantizing the LP coefﬁcients.1 CONSTRUCTION OF LSP POLYNOMIALS The LSP polynomials are computed from the linear predictor polynomial A(z) as follows.1 shows the steps involved in the generation the LSP polynomials. LSPs can be quantized using perceptual criteria and have good interpolation properties. Scalar quantization may result in nonmonotonicity of the LSFs and in that case. the corresponding LSP frequencies are quantized. the zeros of the LSP polynomial have two interesting properties: (1) they lie on the unit circle and (2) the zeros of the two polynomials are interlaced [61]. (3. P (z) = A(z) + z−(m+1) A(z−1 ) . Q(z) = A(z) − z−(m+1) A(z−1 ) .1) (3.The LSP frequencies are also called Line Spectral Frequencies (LSFs). This is because of the reason that P (z) has symmetric coefﬁcients and Q(z) has antisymmetric coefﬁcients. the reconstructed LPC ﬁlter retains the minimum phase property [62].
2. in addition to storing a part of the coefﬁcients as described above. pf_4 = [1. are listed below. the coefﬁcients of P (z) and Q(z) are generated. −1. 1.8625. The reason is that the polynomials have symmetry and their ﬁrst coefﬁcient is equal to 1.2212. the coefﬁcients of the symmetric LSP polynomial (pf_4) and the coefﬁcients of the antisymmetric LSP polynomial (qf_4).0000] . 2. Using Program P3. 0. 0) in the zplane as it has antisymmetric coefﬁcients.1.1: Computation of line spectral frequencies. To regenerate A(z) from P (z) and Q(z).30 3. 1.0719. qf_4 = [1. 0. −1. only half the coefﬁcients of the LSP polynomials. 0.0719. LINE SPECTRAL FREQUENCY COMPUTATION Figure 3. −2. −1.8625. all the coefﬁcients of both P (z) and Q(z) are stored in the output variables pf and qf for illustration purposes.1 The linear predictor coefﬁcients for the speech signal given in Figure 2.6326.3523. The resulting LSP polynomial coefﬁcients are shown in Figure 3. As another example. −1. a fourthorder LP polynomial is considered. −1.0000.0000.2212.3 shows that the coefﬁcients of P (z) are symmetric and those of Q(z) are antisymmetric. The coefﬁcients of LP polynomial (a_4exp).7197. it has symmetric coefﬁcients and Q(z) has a root at (0. excluding the ﬁrst element are stored. we use.5419. (3.1. .5[P (z) + Q(z)] . −0.0000] . Figure 3. In Program P3.3) In practice.4 are shown in Figure 3. Example 3. 1. A(z) = 0.3.0000.6326. a_4exp = [1.3206] .
2.5) .1: LSP polynomial construction. The symmetric polynomial is given by. P (z) = 1 + p1 z−1 + p2 z−2 + · · · + p2 z−(m−1) + p1 z−m + z−(m+1) . we evaluate the zeros of P (z) by solving for ω in. (3.2 COMPUTING THE ZEROS OF THE SYMMETRIC POLYNOMIAL It is not possible to ﬁnd the roots of the symmetric polynomial in closed form. (3. but there are some properties of the polynomial that can be exploited to ﬁnd its roots numerically in a tractable manner. 3. Assuming that the symmetric polynomial has roots lying on the unit circle.3.4) where pi is the i th coefﬁcient of P (z). (m − 1)ω (m + 1)ω + p1 cos + · · · + p(m+1)/2 2 2 P (z)z=ej ω = 2e−j m+1 2 ω cos . COMPUTING THE ZEROS OF THE SYMMETRIC POLYNOMIAL 31 Program P3.
LINE SPECTRAL FREQUENCY COMPUTATION Figure 3. The fundamental assumption is that .2. In order to compute the roots of the symmetric polynomial numerically using Descartes’ rule. The upper half of the unit circle is divided into 128 intervals and each interval is examined for zeros of the polynomial.3: Symmetric and antisymmetric polynomial coefﬁcients. Also. the following procedure is adopted. The MATLAB code to compute the zeros of the symmetric polynomial is given in Program P3. the zero at the point (−1. 0) in the unit circle is not computed. Figure 3.5) as P (ω).2: Linear predictor coefﬁcients.32 3. Since the zeros occur in complex conjugate pairs we compute only the zeros on the upper half of the unit circle. Let us call the bracketed term of (3.
2: Computation of zeros for symmetric LSP polynomial. COMPUTING THE ZEROS OF THE SYMMETRIC POLYNOMIAL 33 Program P3.2.) . (Continues.3.
. LINE SPECTRAL FREQUENCY COMPUTATION Program P3.2: (Continued.) Computation of zeros for symmetric LSP polynomial.34 3.
ﬁnding one zero is actually equivalent to ﬁnding two LSFs. we can write Q(z) as. 3. respectively. If there is a sign change. there are no zeros and therefore the program proceeds to the next interval. the next step is the computation of midfrequency fm using the bisection method. Upon computation of pxr and pxl. as our interval is small enough there can be no more than a single zero.3 COMPUTING THE ZEROS OF THE ANTISYMMETRIC POLYNOMIAL Finding the root of the antisymmetric polynomial is similar to ﬁnding the roots of the symmetric polynomial. there are an odd number of zeros. Then again. If there is a sign change. If there is a zero. The polynomial P (ω) is evaluated at the frequencies fr and fl which are the end frequencies of the interval considered. + q(m+1)/2 2 2 .4. In general. This is repeated until the value of pxm gets close to zero and then the frequency interval is bisected to ﬁnd the zero. Since we need to evaluate the zeros of Q(z) only on the unit circle. we move on to the next interval to ﬁnd the next zero. the interval is changed accordingly. However. As it can be seen.2 The roots of the symmetric polynomial whose coefﬁcients are shown in Figure 3. Initially the value of P (ω) at frequency 0 is the sum of its coefﬁcients.3 are given in Figure 3.Then P (ω) is evaluated at the radian frequency ω. Therefore.5ω. Q(z)z=ej ω = 2j e−j m+1 2 ω sin (m + 1)ω (m − 1)ω + q1 sin + . The antisymmetric polynomial is given by.3.3. the signs of pxm and pxl are compared. where ω is represented as π i/128. Hence. their signs are compared. Example 3. then by Descartes’ rule. the program variable jc contains the argument of the cosine function excluding 0. the number of frequencies computed (nf) is incremented by two because each zero on the upper half of the unit circle is a complex conjugate of another in the lower half of the unit circle. COMPUTING THE ZEROS OF THE ANTISYMMETRIC POLYNOMIAL 35 the interval is so small that it contains at most a single zero. (3. . then there is a zero between fm and fl. . the roots lie on the unit circle and appear as complex conjugate pairs. The values of the polynomial P (ω) in the two frequencies are stored in the program variables pxr and pxl. The value of pxm is subsequently calculated by using fm. and the program variable ang is the complete argument for the cosine function.6) where qi ’s are the coefﬁcients of Q(z). After ﬁnding a zero. This process is repeated until all the zeros on the upper half of the unit circle are determined. it is simpler because of the property that the roots of the symmetric and antisymmetric polynomial are interlaced. However. Every time a zero is calculated. Q(z) = 1 + q1 z−1 + q2 z−2 . If there is no sign change. .7) . − q2 z−(m−1) − q1 z−m − z−(m+1) . (3. .
The procedure for computing the zeros of Q(z) is very similar to the procedure for computing the zeros of P (z). Another example is to consider a fourthorder LP polynomial and compute the LSP frequencies. since we know that the zeros of P (z) and Q(z) are interlaced. we get the symmetric LSFs (sym_freq_4) from Program P3. the start and end points of the interval are taken as the consecutive zeros of symmetric LSP polynomial.3 are given in Figure 3. we use sine function because of the difference between (3. asym_freq_4 = [0. 0. Secondly.1082.3.2102] . The zero at the point (0.3 The roots of the antisymmetric polynomial whose coefﬁcients are shown in Figure 3.3822] . Considering the same LP polynomial in Example 3. LINE SPECTRAL FREQUENCY COMPUTATION Figure 3. all the roots lie on the unit circle and appear as complex conjugate pairs. 0.3 as.7). The procedure to compute the roots of Q (ω) and hence that of Q(z) is given in Program P3. 0.1082.1.3822] .2033. 0.5) and (3. 0) in the unit circle is not computed.36 3. except for a few differences. Again. where we call the bracketed term as Q (ω). 0. Firstly.4: Roots of the symmetric LSP polynomial. instead of using the cosine function.2 and antisymmetric LSFs (asym_freq_4) from Program P3.5. . freq_4 = [0. sym_freq_4 = [0.0682. Example 3.0842.
(Continues.3: Computation of zeros for antisymmetric LSP polynomial.3.3. COMPUTING THE ZEROS OF THE ANTISYMMETRIC POLYNOMIAL 37 Program P3.) .
LINE SPECTRAL FREQUENCY COMPUTATION Program P3. .3: (Continued.) Computation of zeros for antisymmetric LSP polynomial.38 3.
5: Roots of the antisymmetric LSP polynomial. Figure 3.4187. −1.5. and hence they range between 0 and 0. 0. The relation between an LSF and the corresponding zero of the LSP polynomial is given as.6.1 are given in Figure 3.8) where f is the LSF and z is the corresponding zero of the LSP polynomial. ˆ (3.3 will not be . The LSFs 0 and 0. COMPUTING THE ZEROS OF THE ANTISYMMETRIC POLYNOMIAL 39 Figure 3.3.4470] . So far. We can see that all the roots of Q(z) do not lie on the unit circle and hence do not satisfy the assumption that all the roots of the LSP polynomials lie on the unit circle. z = cos(2πf ) + i sin(2πf ) .5. we have examined LP polynomials that are minimum phase to generate LSP frequencies. Therefore. 0) in the unit circle corresponds to a normalized frequency of 0 and the zero of Q(z) at the point (−1.8 shows the LSFs of the symmetric LSP polynomial (shown as stars) and the LSFs of antisymmetric LSP polynomial (shown as circles). The polezero plot of 1/A(z) is given in Figure 3. 0. a_nm = [1.3. where the array freq_4 stores all the LSFs.7. We study here the effect of LP polynomials with nonminimum phase by considering the following coefﬁcients of A(z).3.0000. where stars indicate the LSFs from P (z) and circles indicate the LSFs from Q(z). but rather they are calculated directly using MATLAB.5 are ˆ not included as it is evident that the zero of P (z) at the point (0. 0) in the unit circle corresponds to a normalized frequency of 0. where we can see the poles lying outside the unit circle making A(z) a nonminimum phase polynomial. 0.2 and P3. The LSFs for the LP polynomial given in Example 3. Note that the frequencies are not obtained using Programs P3.2 and P3.5898. Note that the LSFs are normalized with respect to the sampling frequency. the LSFs obtained from the Program P3.3947.
40 3. the LP polynomial is minimum phase and our assumption that the zeros of LSP polynomials lie on the unit circle is valid. the . 3.4 TESTING ILLCONDITIONED CASES Illconditioned cases in ﬁnding the roots of the LSP polynomials occur if the LP polynomial used to generate them is nonminimum phase. Figure 3. In a wellconditioned case.6: Roots of the LSP polynomials. accurate because of the assumption of minimum phase LP polynomial.7: Polezero plot of the nonminimum phase ﬁlter. The attempt to detect and correct this case of illcondition is described in the next section. LINE SPECTRAL FREQUENCY COMPUTATION Figure 3. If the zeros of the symmetric and antisymmetric LSP polynomials do not alternate on the unit circle. A(z).
4. Therefore.0064. the LSFs of the previous frame are used. we can detect and correct the illconditioned case. If more than one LSF is 0 or 0. freq = [0. then the program variable lspflag is set to 1. and using Program P3.1139. P3. Program P3. 0.4 Consider the nonminimum phase LP polynomial in Example 3. If the LSFs are nonmonotonic.3619] . an attempt to correct them is made. We can see that the LSFs are nonmonotonic. 0.2. the Programs P3.4 tests illconditioned cases that occur with LSPs and tries to correct them.3 do not output monotonic LSFs. and using Programs P3. 3.3.1987. QUANTIZING THE LINE SPECTRAL FREQUENCIES 41 Figure 3. 0. After using Program P3.1. Example 3.5.1.1139.5. indicating that a case of illcondition is encountered.2 and P3. the LSFs reordered for monotonicity are obtained as.1987.3. 0. we obtain the LSFs as. 0. LP polynomial may not be minimum phase. Six frequencies (LSF 1 and LSFs 5 to 9) will be quantized using 3 bits and the remaining four (LSFs 2 to 5) will be quantized .3. 0. If that attempt fails.0064. freq = [0.5 QUANTIZING THE LINE SPECTRAL FREQUENCIES The LSFs will be quantized using the quantization matrix given in Table 3.3619] .8: Roots of the LSP polynomials for the nonminimum phase ﬁlter.4. and P3.
42 3. .1: LSF quantization matrix. LINE SPECTRAL FREQUENCY COMPUTATION Table 3.
Also. the nonzero values in each of the columns of Table 3. it may be quantized to a lower value. third. the second LSF using the second row and so on.1.1 are not always in increasing order. that the number of levels used for quantizing the second.5. In Program P3. QUANTIZING THE LINE SPECTRAL FREQUENCIES 43 Program P3. it can be seen from Table 3. The ﬁrst LSF is quantized using the quantization levels given in the ﬁrst row of the matrix given in Table 3. hence needing 3 bits each. the program variable bits is initialized as. quantization is performed using the quantization table.5.e. i. Note that in some cases though the second LSF is higher than the ﬁrst. . whereas for the other LSFs.3.. This is because the quantization tables overlap.1. there are 9 quantization levels. The LSFs are quantized such that the absolute distance of each unquantized frequency with the nearest quantization level is minimal. Nevertheless. using 4 bits resulting in a total of 34 bits per frame. and the issue of quantized LSFs becoming possibly nonmonotonic is handled later. thereby needing 4 bits. fourth and ﬁfth LSFs are 16.4: Testing cases of illcondition.
the number of bits used for quantization determines the number of levels. 4. The spectral plots of the LP synthesis ﬁlters corresponding to the unquantized and quantized LSFs are given in Figure 3.5 to obtain indices of quantized LSFs (findex) in the quantization table.44 3. which is used to scale the LSFs to the appropriate range.2 gives the input unquantized LSFs. output indices (findex) and the quantized frequencies corresponding to those indices. 3. 4. unquantized frequencies. 3. Note that the LSFs must be converted to LP coefﬁcients using Program P4. Example 3. The corresponding polezero plots are given in Figure 3. 3. FSCALE is the sampling frequency. The quantization matrix given in Table 3. 4. LINE SPECTRAL FREQUENCY COMPUTATION bits = [3.5 . The index of the quantized LSF is stored in the program variable findex.9.1 is stored in the program variable lspQ. Table 3. 4. e Program P3. Quantization of each LSF is performed such that the distance between the unquantized LSF and the corresponding quantization level in the table is minimized. The LSFs are quantized using Program P3. 3.10.1 in order to visualize the spectra and polezero plots.5: Quantizing the LSFs. For each LSF. 3] .
0 3 2525 0. ADJUSTING QUANTIZATION TO PRESERVE MONOTONICITY 45 Table 3. Therefore.1785 1428.2: Example results for Program P3.2957 2365.3 8 500 0.3761 3009.1612 1289. it is important to preserve the monotonicity of LSFs because if they become nonmonotonic.9 5 2300 0.2 3 1690 0.4049 3239.3 13 670 0.0617 493. the zeros of the two LSP polynomials will no longer be interlaced.7 9 1270 0. the quantized LSFs are checked for monotonicity. Unquantized Unquantized Index to Quantized LSF Frequency Quantization Frequency (Hz) Table (Hz) 0. This will lead to the loss of minimum phase property of the LP polynomial.2128 1702.3141 2513. But.1 3 3000 0.6.0999 799. 3.3.0 7 1430 0.9: Spectra from unquantized and quantized LSFs.1 8 775 0.5.4 2 3270 Figure 3.6 ADJUSTING QUANTIZATION TO PRESERVE MONOTONICITY Quantization of LSFs may lead to nonmonotonicity because of the overlapped quantization tables.0802 641. and if there is a case where the previous quantized LSF has a higher value than the .
1212 0.433 Monotonic Quantized LSFs 0.0512 0.3156 3 7 0. the previous LSF is quantized to the next .0501 0. If this is true. Index to Nonmonotonic Corrected Index Quantization Quantized to Quantization Table LSFs Table 7 0.6 takes unquantized LSFs (freq) and quantized LSF indices (findex) as the ﬁrst two inputs.3103 0.0969 8 6 0. with the previous LSF remaining at the same quantized level.3156 0.055 0.3: Example results for Program P3.3100 6 3 0.2335 0.2288 0.0969 0.0500 8 8 0. then the problem is corrected by either quantizing the current LSF to the next higher level or quantizing the previous LSF to the next lower level.1212 6 3 0.2288 4 6 0.1387 0.1413 0.31 0. For each LSF.46 3. with the current LSF remaining at the same quantized level. current LSF.0525 0. then the current LSF is quantized to a next higher level.1413 3 4 0.1209 0. Upward error is the error resulting if the current LSF were quantized to a next higher value.6.4288 7 5 0. Downward error is the error resulting if the previous LSF were quantized to a next lower value.0942 0.4363 Program P3. the program checks whether the current quantized LSF is less than or equal to the previous quantized LSF. Table 3.4288 0.4235 0. LINE SPECTRAL FREQUENCY COMPUTATION Figure 3. If the upward error is lower than the downward error.3214 0. then the corresponding upward and downward errors are calculated.4363 5 Unquantized LSFs 0.0525 7 7 0.10: Polezero plots from unquantized and quantized LSFs. Else.
end end end end end Program P3. levels(i) ).6: Adjusting quantization to preserve monotonicity. .6. ADJUSTING QUANTIZATION TO PRESERVE MONOTONICITY 47 findex(i) = min( findex(i)+1.3.
quantized and monotonically quantized LSFs. we get the indices shown in fourth column as the output.3. .3.3 as an input to Program P3. The indices to the quantization table output from the program correspond to monotonic quantized LSFs that are given in the last column of Table 3. making the system minimum phase.11.11: Polezero plots from unquantized. The polezero plots of the LP synthesis ﬁlters corresponding to unquantized LSFs. If this operation introduces nonmonotonicity. quantized LSFs and quantized LSFs corrected to be monotonic are given in Figure 3.6. the pole comes inside the unit circle again. If we provide the indices of the nonmonotonically quantized LSFs (findex) shown in the second column of Table 3.48 3. Example 3. The pole that crosses the unit circle is zoomed in and it is clear that the nonmonotonicity of LSFs that occurs due to quantization results in the pole crossing to the outside of the unit circle. regardless of the higher upward error. then the current LSP is quantized to the next higher value. Figure 3. LINE SPECTRAL FREQUENCY COMPUTATION lower level. When the LSFs are corrected to be monotonic.6 In this example. we will consider a case where the quantized LSFs are nonmonotonic as observed in Table 3.
P (z) is a polynomial with evensymmetric coefﬁcients and Q(z) is a polynomial with odd symmetric coefﬁcients. 68] is common practice. SUMMARY 49 3. Immittance Spectral Pairs (ISPs) are another form of spectral representation for the LP ﬁlter [69] similar to LSPs. Some of the most modern speech coding standards such as the AMRWB codec and the sourcecontrolled VMRWB codec use ISPs as parameters for quantization and transmission [5.7 SUMMARY The LSP polynomials. The zeros of the polynomials. Developments in enhancing the coding of the LSFs include using vector quantization [65] and discrete cosine transform [66] to encode the LSFs.3. In thirdgeneration standards. 42]. They could arise either due to a nonminimum phase LP polynomial or quantization of LSFs. vector quantization of the LSFs [67.7. LSFs can be efﬁciently computed using Chebyshev polynomials [63]. and details on the spectral error characteristics and perceptual sensitivity of LSFs are given in a paper by Kang and Fransen [64]. Note that illconditioned cases are indicated by the nonmonotonicity of the LSFs. the LSFs. and they are considered to represent the immitance of the vocal tract. . are computed using Descartes’ method. P (z) and Q(z) are generated from the LP coefﬁcients.
.
1: Spectral distortion computation.1 shows the steps involved in the computation of the spectral distortion measures. conversion of RCs to autocorrelation lags and calculation of distances between autocorrelation lags will be illustrated in the following sections using MATLAB programs. (2) the likelihood ratio and (3) the cosh measure. There are many distance measures that could be used to characterize the distortion due to the quantization of the LSFs. The quantized and unquantized LSFs are converted to their respective LP coefﬁcients.5). Computation of the RCs from LP coefﬁcients using inverse LevinsonDurbin recursion has been already explained in Section 2. The RCs are ﬁnally converted back to autocorrelation lags and the distances between the autocorrelation lags are computed.51 CHAPTER 4 Spectral Distortion Spectral distortion in the FS1016 encoder occurs due to the quantization of LSFs. The following distance measures are considered in this chapter: (1) the cepstral distance measure. Conversion of LSPs to LP coefﬁcients.5 (Program P2. . Figure 4. The LP coefﬁcients are then converted to the RCs using inverse LevinsonDurbin recursion. The block diagram in Figure 4.
1. SPECTRAL DISTORTION 4. Therefore. The total ﬁlter polynomial is a cascade of many such secondorder stages. the impulse response samples are the coefﬁcients of the ﬁlter. In Program P4. Each LP . The variables a and b are the arrays that contain the outputs of the current stage and.2: Secondorder section of P (z) and Q(z). Each LSP frequency ωi represents a polynomial of the form 1 − 2 cos ωi z−1 + z−2 [63]. p and q are the arrays that contain the cosine of the LSP frequencies of the polynomials P (z) and Q(z).1 CONVERSION OF LSP TO DIRECTFORM COEFFICIENTS Conversion of LSP frequencies to LP coefﬁcients is much less complex than ﬁnding the LSP frequencies from the LP polynomial.3).2.52 4. The latter calculates the output samples for each stage and populates the memory elements. of the type shown in Figure 4. Upon multiplying the secondorder factors. the coefﬁcients of the LP polynomials are found by computing the impulse response of the cascade of the secondorder stages. consequently. x 1 + y z1 2cos(ω) + z1 1 Figure 4. But here we adopt a different approach by realizing that each secondorder factor of the polynomial P (z) and Q(z) can be considered as a secondorder ﬁlter. The outer for loop cycles through the samples and the inner for loop cycles through the stages of the cascaded ﬁlter for each input sample. The variables a1 and b1 are the ﬁrst level memory elements that correspond to ﬁrstorder delays and a2 and b2 are the second level memory elements that correspond to secondorder delays. Then the LP polynomial can be reconstructed using (3. the inputs for the next stage. Because the ﬁlter is Finite Impulse Response (FIR). we get the symmetric and antisymmetric polynomials P (z) and Q(z).
2500.2957.10. 0.6440.2) where.4251. f = [0.0617. fq = [0.0802. 0.5316.6354.2369.2112. we ﬁrst obtain the LP polynomials of increasing orders from the reﬂection coefﬁcients using forward and backward predictor recurrence relations that are given in (4.3761. 1. For the case of a tenthorder LP polynomial.7458. 0.5702. 0.2128. 1. ai+1 (j ) = ai (j ) + ki+1 ai (i + 1 − j ) ai+1 (i + 1 − j ) = ai (i + 1 − j ) + ki+1 ai (j ) .3156. The Program P4. −0. 0.1) (4.1708] . −0. 0. −0. 0. (4. 0.0367.3111.1) and (4.0473.2870.7778. 0.6310. which implies that the LP synthesis ﬁlter is stable.3141. 0. 0. 0.1396. −0. 0.2507. −0. 0. and the corresponding RCs (rc and rcq) are given below. COMPUTATION OF AUTOCORRELATION LAGS FROM REFLECTION COEFFICIENTS 53 polynomial coefﬁcient.1 The LP coefﬁcients (pc and pcq) are also converted to RCs using Program P2. −1.6992. −1.3701. −0. 4. −0.7386. −0.0151. −1.7817. respectively.0557. 0. 0. 0. Finally.3205.2875. rcq = [0. 0.1169. −0.0000.1787. 0.7562.0969. is computed as the average of the coefﬁcients of the two ﬁlters as per (3.2).9365. −0. 0.2440] . −0.2. 0. Note that the absolute value of the elements of the arrays rc and rcq are less than one.1785. The LP coefﬁcients (pc and pcq) are converted to RCs using Program P2.2440] .0838. The spectra and polezero plots of the LP synthesis ﬁlters corresponding to unquantized and quantized LSFs are given in Figures 3.5921.3094. 1. except the ﬁrst coefﬁcient. 1. 0.0625.9 and 3. respectively. pc = [1.5 and the corresponding RCs (rc and rcq) are also given below.3).5.2 COMPUTATION OF AUTOCORRELATION LAGS FROM REFLECTION COEFFICIENTS To calculate the Autocorrelation (AC) lags from RCs [50].0000.1 takes the LSP frequencies as the input and generates the LP coefﬁcients as the output.1612. which is always one. −1. 0. 0. 0. pcq = [1. −0. −0.3723.3750.4893. 0.2. −1.1588. ai (j ) is the j th predictor coefﬁcient of order i and ki is the reﬂection coefﬁcient.1708] . the LP coefﬁcients are stored in the array pc.7447. quantized and unquantized LSP frequencies (f and fq) supplied as input to the program and the corresponding LP coefﬁcients (pc and pcq) output from the program are given below. −1.4088] . −0. initially the autocorrelation array r is stored with the reﬂection coefﬁcients rc. −0. In Program P4.2326. −0.0247.9385. rc = [0. 1. Then . 0. 0.4049] .0999. Example 4.0816.4. 0.5828.
) . SPECTRAL DISTORTION Program P4.1: Conversion of LSP frequencies to LP coefﬁcients. (Continues.54 4.
3) In Program P4. r(i) is the autocorrelation at lag i.1: (Continued. (4. all the coefﬁcients of the intermediate LP polynomial except the ﬁrst coefﬁcient are negatives of their actual values because of our sign .2. the ﬁrst half of the polynomial coefﬁcients are computed using forward predictor recursion given in (4.4) where. For each intermediate predictor polynomial. a temporary array t is created to store the intermediate LP polynomial coefﬁcients of increasing orders. Then. In Program P4.2). The last coefﬁcient of any intermediate polynomial is equal to the negative of reﬂection coefﬁcient.2.3) and the negative sign is because of our sign convention. then we know that.) Conversion of LSP frequencies to LP coefﬁcients. the autocorrelation lag of order i + 1 is computed by. where m is the order of the actual predictor polynomial.1) and the second half of the polynomial coefﬁcients are computed using reverse predictor recursion given in (4. If we have a predictor polynomial of order m.2. i+1 r(i + 1) = k=1 ai+1 (k)r(i + 1 − k) . This follows from (4. The LP polynomial coefﬁcients are computed for increasing order of the intermediate predictor polynomials from 2 to m.4. (4. we negate the reﬂection coefﬁcient and assign it to the intermediate LP polynomial coefﬁcient because we want to follow the sign convention where the ﬁrst reﬂection coefﬁcient equals the normalized ﬁrstorder autocorrelation lag. COMPUTATION OF AUTOCORRELATION LAGS FROM REFLECTION COEFFICIENTS 55 Program P4. The order of the intermediate LP polynomial is indicated by the variable i in the program. am (m) = km .
1267.3639. −0. −0. 0. 0. ac = [1.5576.2 are given below.2 The autocorrelation lags that are output when the RCs in Example 4. acq = [1.3324] . −0. −0. −0.5416. −0.4277. Therefore.4701.3086. −0. −0. 0.7562.1523] . −0.7817. convention. . We will be using these autocorrelation lags to calculate various distance measures that quantify the amount of distortion that has occurred due to quantization.1406.4) is implemented in the program. −0. −0.2725. 0. −0.3443.4154.0506. −0. Example 4.2: Computation of autocorrelation lags from reﬂection coefﬁcients. 0. 0. the ﬁnal sum is negated to get the autocorrelation lag with proper sign.4573. SPECTRAL DISTORTION Program P4. −0.4001.5160. when (4.1 are input to Program P4.The array ac contains the autocorrelation lags corresponding to rc and acq contains the autocorrelation lags corresponding to rcq.56 4.2241.
1 1 = (σ/σ )2 (δ/α) + (σ /σ )2 (δ /α ) − 1 . α and α are denoted by del. k=∞ ln[σ 2 /A(ej θ )2 ] = k=−∞ k=∞ ck e−j kθ . The array element dm(3) is the distance measure 4. where ω is calculated for σ = σ = 1 and the factor 4. the δ.34294418 is used to convert the cosh measure to decibels.9) where the cepstral coefﬁcients ck and ck are the Fourier series coefﬁcients for the log spectra given by. In Program P4. The minimum residual energy α would be obtained if the speech samples of the current frame. In Program P4. the LP ﬁlter parameters and the reﬂection coefﬁcients. Consequently. Nine different distances are stored in an array for each frame.3 (dist. used to generate the LP polynomial A(z) is passed through the same ﬁlter. the cosh measure and the cepstral distance [70]. c0 = ln[(σ )2 ] . The program fin. We denote. using the autocorrelation sequence. The ratios δ /α and δ/α are deﬁned as the likelihood ratios and indicate the differences between the LPC spectra before and after quantization. c0 = ln[σ 2 ] . The program cfind. δ . The ﬁrst two distance measures dm(1) and dm(2) denote the ratios δ/α and δ /α .10) (4. ck e−j kθ . The method of ﬁnding likelihood ratios is described in [70]. (4.m to compute the cepstral coefﬁcients. The residual energy δ is obtained when {x (n)} is passed through A(z).m is used by dist. The residual energy δ. CALCULATION OF DISTANCE MEASURES 57 4. delp. then α’ would be the minimum residual energy found when {x (n)} was passed through A (z). (4.7) ω = ln[1 + + where ω is the cosh measure.3. the residual energy.4. (4.3. is obtained when {x(n)} is passed through the ﬁlter A (z). and alpp. say {x(n)}.m). let us deﬁne A(z) as the LP polynomial that corresponds to the unquantized LSFs. alp.m converts to the cosh measure dm(6). (4. the ﬁlter autocorrelation lags.The LP polynomial that would correspond to quantized LSFs is denoted by A (z).6) (2 + )] .8) (4.34294418ω.5) 2 2 cosh(ω) − 1 = .3 CALCULATION OF DISTANCE MEASURES The distance measures used to measure the distortion due to quantization of LSP frequencies are the likelihood ratio. The parameters σ and σ used above are the gain constants related to the cepstral gains by. the array of distance measures is indicated by dm. if we assume that some sequence of samples {x (n)} was used to generate A (z). Considering the log likelihood distance measure.11) ln[(σ )2 /A (ej θ )2 ] = k=−∞ . (4.
3: Computation of distance measures.58 4. SPECTRAL DISTORTION Program P4.) . (Continues.
(4. For the sixth distance measure.) Computation of distance measures. (σ )2 = α /r (0). To obtain the fourth distance measure dm(4). the gain constants are adjusted to minimize and the minimum is obtained as. For dm(5). dm(6). (4.12) (4. σ 2 = α/r(0). r indicates the autocorrelation lag corresponding to the quantized LSP frequencies.3.15) .13) where. we take σ 2 = α and (σ )2 = α . (4.4.3: (Continued.14) The cepstral distance measure u(L) is given by. CALCULATION OF DISTANCE MEASURES 59 Program P4. min = [(δ/α)(δ /α )]1/2 − 1 . L [u(L)]2 = (c0 − c0 )2 + 2 k=1 (ck − ck )2 . we consider the following values for σ 2 and (σ )2 .
which provides a lower bound for the rms log spectral distance. The frame energy and the spectrogram of the speech is also given in the same ﬁgure.2890. dm = [1. The properties of cepstral coefﬁcients from minimum phase polynomials [72] are exploited to compute the cepstral distance measure. Example 4.6750] .60 4. which is reasonable because of the fact that all three of them measure the distortion of spectra corresponding to quantized and unquantized LSFs.2 are used in Program P4. .4 SUMMARY Procedures for computing three distinct types of spectral distance measures were presented in this chapter.7117. The signiﬁcance of the likelihood distance measures when the data are assumed to be Gaussian and their efﬁcient computation using autocorrelation sequences are provided in a paper by Itakura [71]. The three distance measures follow the same trend over most of the frames. The cepstral distance dm(9) and likelihood ratios dm(1) and dm(2) for all frames from the speech ﬁle ‘cleanspeech.12) and (4. The cepstral distance is a computationally efﬁcient way to obtain the rootmeansquare (RMS) difference between the log LPC spectra corresponding to the unquantized and quantized LSFs.3189.3189.13) holding true. dm(8) is the cepstral measure obtained with (4.6750.7117. 1.0477. 1. 1. 1. and ﬁnally. is the cepstral distance calculated with σ = σ = 1. 4. The distance measure dm(7). The cosh distance measure closely approximates the rms log spectral distance as well and provides an upper bound for it [70]. L is the number of cepstral coefﬁcients. The distance measure array output from the program is given below. These distance measures have a meaningful frequency domain interpretation [70]. 1.3.3 to ﬁnd the distance measures. SPECTRAL DISTORTION where. dm(9) is the cepstral measure with the conditions σ 2 = α and (σ )2 = α .0452.wav’ [76] are given in Figure 4. The order of the ﬁlter (m) is taken as 10 and the number of terms used to ﬁnd the cepstral distance measure (l) is taken as 40. Note that no logical relationship can be drawn between the distance measures and the frame energy or the spectrogram of the speech signal. 1. 1.3 The autocorrelation lags from Example P4. 1.
3: (b) Likelihood ratio distance measure at all the frames.4.4.3: (a) Cepstral distance for all the frames. SUMMARY 61 Figure 4. . Figure 4.
3: (c) Spectrogram at all the frames.62 4. .3: (d) Frame energy at all the frames. SPECTRAL DISTORTION Figure 4. Figure 4.
The idea behind both the codebook searches is to determine the best match in terms of the codevector and the optimal gain . The vector excitation scheme using stochastic codevectors for AbyS LP coding was introduced in a paper by Schroeder and Atal [1].1 will be described in detail with the necessary theory and MATLAB programs in the following sections. 5. using the LP parameters obtained from the interpolated LSPs of each subframe. The pitch delay of the LP residual is ﬁrst predicted by the LTP and the SCB represents the random component in the residual. we give an overview of the structure of ACB and SCB and the search procedures. In this section. Demonstration of some important details will also be provided with some numerical examples and plots. Determining the best matching codevector for a particular signal is referred to as Vector Quantization (VQ) [73. The codebook search procedure outlined in Figure 5.1 OVERVIEW OF CODEBOOK SEARCH PROCEDURE The use of the closedloop LTP was ﬁrst proposed in a paper by Singhal and Atal [2] and led to a big improvement in the quality of speech.1 provides an overview of the codebook search procedure in the analysis stage consisting of both the ACB and the SCB.1: Codebook search procedure. The codebook searches are done once per subframe for both the ACB and the SCB.63 CHAPTER 5 The Codebook Search The prediction residual from the LP ﬁlter of the CELP analysis stage is modeled by an LTP and the Stochastic Codebook (SCB). Figure 5. Figure 5. The LTP is also called as the Adaptive Codebook (ACB) because the memory of the LTP is considered as a codebook and the best matching pitch delay for a particular target is computed. 74].
Because of this reason.0 using a modiﬁed Minimum Squared Prediction Error (MSPE) criterion.64 5. For the ACB.e. .2. Therefore. the ACB search is performed ﬁrst and then the SCB search is performed. ga and gs are the gains that correspond to the adaptive and stochastic codewords. where γ is the bandwidth expansion factor. The search involves a closedloop analysis procedure. . the codebook consists of codewords corresponding to 128 integer and 128 noninteger delays ranging from 20 to 147 samples. 2 1 LP Synthesis Filter + is gs s ŝ Perceptual Weighting Filter Minimize Perceptual Error e gs is ga ia ia ga Figure 5. . once every subframe. it is clear that the combination of the ACB and the SCB is actually attempting to minimize the perceptually weighted error. Codebook search here means ﬁnding the optimal pitch delay and an associated gain that is coded between −1.The adaptive codebook gets updated whenever the pitch memory of the LTP is updated.. The target for the SCB search is the target for the ACB search minus the ﬁltered ACB VQ excitation [20]. the average bits needed for encoding the ACB index is 7. The major difference between the search procedures is that the target signal used in the process of error minimization is different. i. A(z/γ ). Submultiple delays are also checked using a submultiple delay table to check if the submultiples match within 1 dB of MSPE. . This favors a smooth pitch contour and prevents pitch errors due to choosing of multiples of pitch delay. . Therefore. Stochastic Codebook 512 511 . The target for the ACB search is the LP residual weighted by the perceptual weighting ﬁlter. 2 1 + Adaptive Codebook 256 255 . ia and is are the indices to the ACB and SCB that correspond to the optimal codevectors.0 and +2. Odd subframes are encoded with an exhaustive search requiring 8 bits whereas even subframes are encoded by a delta search procedure requiring 6 bits. the theory and most of the MATLAB implementation of both the ACB and the SCB searches are similar. THE CODEBOOK SEARCH (gainshape VQ [75]) so that the codevector is closest to the target signal in the MSE sense. In the MATLAB implementation. The entire search procedure is illustrated in Figure 5.2: Searching the adaptive and the stochastic codebooks.
1.2) (5.5. where i is the index of the codevector in the codebook. which will be described later in detail.1) represents the effect of passing the codeword through the LP and the weighting ﬁlters. The ﬁltered codeword is given by. the search procedures for both the ACB and the SCB are essentially the same. This allows for fast convolution and end correction of subsequent codewords. The overall product in (5. except that the targets are different. The scores are given by. the codebook is overlapped with a shift of 2 samples between consecutive codewords. Figure 5. g (i) = m(i) y (i)T e(0) . respectively.1) where W and H are the L × L lower triangular matrices that represent the truncated impulse responses of the weighting ﬁlter and the LP synthesis ﬁlter. As we mentioned earlier. Assuming that we have the L × 1 target vector e(0) and codevectors x (i) .3: Examples of ternary valued SCB vectors. A sample of two consecutive codewords shifted by −2 found in the stochastic codebook is given in Figure 5. 0 and +1) [18. y (i) = WHx (i) .3 where the overlap. (5.4 (adapted from [20]) illustrates the procedure to ﬁnd the gain and the match scores. 46]. Figure 5. the goal is to ﬁnd the index for which the gain score g (i) and the match score m(i) are maximized.3) . y y (5.The codebook is 1082 bits long and consists of 512 codewords that are sparse and ternary valued (−1. y (i)T y (i) (y (i)T e(0) )2 = (i)T (i) . OVERVIEW OF CODEBOOK SEARCH PROCEDURE 65 For the ﬁxed SCB. sparsity and ternary values could be seen.
The best codebook index i is the one that maximizes mi .4: Finding gain and match scores [From [20]]. (5.3). from Figure 5. We will provide an elaborate study of the target signal for the ACB search. THE CODEBOOK SEARCH Codebook e(0) i x(i) Correlate y(i)T e(0) * y(i) x m(i) g(i) Figure 5. e(0) .2) and (5. ˜ (5..2. we can see that the target signal for the ACB search.1 TARGET SIGNAL FOR ADAPTIVE CODEBOOK SEARCH Denoting s and s(0) as the actual speech subframe vector and the zero input response of the LP ˆ ﬁlter. y (i) = W −1 x (i) .2. .4) The ﬁltered codeword y (i) given in (5.1) is compared with the target e(0) to get the gain and match scores as given in (5. The MATLAB implementation of the codebook search described above takes into account certain considerations in order to reduce the implementation complexity. . One possible simpliﬁcation is to do away with the LP ﬁltering. Therefore. should be. ˆ e(0) = W(s − s(0) ) . The modiﬁed ﬁltered codeword for comparison becomes. Cascaded LP and weighting filters Energy / y(i)T y(i) 5. we use the inverse weighting ﬁlter 1/A(z/γ ). instead of using the weighting ﬁlter A(z/γ ). we will now describe the ACB search in detail with MATLAB code. and the submultiple/fractional delay search.. we see that the codeword has to be ﬁltered through the LP ﬁlter and the weighting ﬁlter for comparison with the target. 5.1).66 5. so that we deal with the excitation signal instead of speech signal. From (5.2 ADAPTIVE CODEBOOK SEARCH Having looked at the match scores and the overview of codebook searches. the integer delay search..5) ..
These are given by.7) (5.6) (5.1: Target for the ACB. Then the ZIR is subtracted from the actual speech subframe and passed through the LP analysis ﬁlter. The error is initialized to zero. Since we work with prediction residuals now. and then it is passed as input to the LP synthesis ﬁlter 1/A(z). ADAPTIVE CODEBOOK SEARCH 67 This requires us to modify the target (converting it to a form of excitation) and the gain and match scores as well. ˜ y y ˜ ˜ (5.2.8) Program P5. which converts it back to the residual.5. the inverse .e. This gives the Zero Input Response (ZIR) i. of the LP synthesis ﬁlter.. the initial condition. ˜ y y ˜ ˜ (i)T e(0) )2 (˜ y ˜ mi = (i)T (i) . e(0) =W −1 H−1 (s − s(0) ) . ˜ ˆ (i)T e(0) y ˜ ˜ gi = (i)T (i) .
the pitch delay is searched between −31 to 32 of the pitch index in the pitch delay table. Example 5. the next step is to search for the integer pitch delay. we will look into the coding of integer pitch delays.5: Speech segment and target for the ACB. e(0) .68 5. the pitch is coded in two stages – searching the integer delays ﬁrst and the neighboring noninteger delays next. Program P5.The full MATLAB code for computing the .. If the subframe is even.3. the excitation vector (v0) is loaded with a copy of the pitch memory (d1b). THE CODEBOOK SEARCH weighting ﬁlter 1/A(z/γ ) is used to obtain the ﬁnal target vector for the ACB. and allowable search ranges are ﬁxed in the variables minptr and maxptr.5. only the integer delays are searched and match score is set to zero. Note that this target vector will be used by the ACB to predict the pitch period. Figure 5.8) in the MATLAB code pgain. i.7) and (5.m) is available under the directory P5_2 in the website [76]. In the integer delay search.1. The MATLAB program that implements the integer pitch delay (integerdelay. then a full search of the pitch delay table (pdelay) is done.1 ˜ performs this operation. The segment of speech and the target are shown in Figure 5. The pitch delay table has both integer and fractional pitch values. In the MATLAB implementation. but it needs supporting ﬁles and workspace [76]. 5. The gain and match scores are computed as given in (5. If the subframe is odd.2. In this section.1 A test speech segment (voiced) is taken and tested with Program P5.m and portions of it are provided in Program P5. Initially.2 INTEGER DELAY SEARCH Once the target signal is generated.e. if the pitch delay is fractional.
) .5.2. ADAPTIVE CODEBOOK SEARCH 69 Program P5.3: Gain and match scores computation. (Continues.
Integer delay search is performed for the speech segment given in Example 5. This is illustrated in the ﬁrst else condition in Program P5. The newly arrived ﬁrst sample in the excitation is multiplied with the timereversed impulse response and added to the previous convolution result.3. THE CODEBOOK SEARCH Program P5. In this program. in order to ensure that the pitch contour is smooth. The ﬁltered codeword is obtained by convolving the excitation (ex) with the truncated impulse response (h) of the inverse weighting ﬁlter. This is obtained by running the MATLAB program integerdelay.70 5.3: (Continued. 2.6. The new result is restricted to the length of the excitation signal (60). Ypg is the variable that stores the ﬁltered codeword.3. we can perform just end corrections from the second convolution onwards and reduce the number of computations. gain and match scores are available in the website under the directory P5_3 [76]. Exploiting this.1. For even subframes the same procedure is repeated except that the search range is limited.) Gain and match scores computation.2. 3. the excitation signal for the second delay is simply shifted by one from the excitation signal for the ﬁrst delay. therefore. The convolution is shown in the ﬁrst if condition in Program P5. then the short vector obtained by delaying the previous excitation signal is replicated by periodic extension. Example 5. all the integer delays are searched. The gain and match scores with respect to the indices of the pitch delay table are plotted as shown in Figure 5. If M is less than 60.8). Once the ﬁltered codeword is obtained. In Program P5.2 5. then the codeword is constructed by taking the previous excitation signal and delaying it by M samples. Note .7) and (5. the gain and match scores are computed exactly in the same way as in (5. and 4 submultiple delays are searched according to the submultiple delay table (submult).m. The maximum index is 117 and the actual pitch period is 56.3 SUBMULTIPLE/FRACTIONAL DELAY SEARCH After performing an integer delay search. Full convolution is performed only when the ﬁrst delay is tested.3. the effect of periodic extension is provided by overlapadd of the convolution result. Because of the integer delay. The subframe is odd. If the pitch delay (M) is greater than the subframe length (60).
10) .9) 6N where N = interpolation length (even) k = −6N. (5. and a sinc function and totally 8 points are used for interpolation.2. the gain and match scores are determined as shown before (pgain. The entire process can be replicated using only windowed sinc interpolation in the time domain [17]. Again. that only the submultiples speciﬁed in the submultiple delay table are searched. ADAPTIVE CODEBOOK SEARCH 71 Figure 5. Submultiple pitch delay is selected if the match is within 1dB of MSPE of the actual pitch delay already chosen. If D fractional samples are to be computed between two integer samples.5. The interpolating function is the product of a hamming window (hw ). −6N + 1. which increases the number of samples by a factor of D. After the interpolation is performed and codevectors corresponding to the fractional delay are found.54 + 0.46 cos . Then. which involves the interpolation of the codevectors (that correspond to integer delay) chosen from the pitch memory (ACB). . 6N and wf (j ) = h(12(j + f )) sin((j + f )π ) .6: Gain and match scores for ACB search in a voiced segment. . the signal is upsampled and lowpass ﬁltered. (j + f )π (5. the signal is downsampled to the actual sampling rate.m). We will now describe fractional pitch estimation. the oversampled signal is delayed by an integral amount equivalent to the fractional delay of the original signal. kπ hw (k) = 0. Therefore. The window and the windowed interpolation functions are given as. it is equivalent to increasing the sampling frequency by D. . .
. and the target signal for the SCB search is different to that of the ACB search.3 STOCHASTIC CODEBOOK SEARCH The SCB search is similar to the ACB search except that the SCB is stored as overlapped codewords. 2 2 2 3 3 Program P5. If u is the codeword obtained from the ACB search. 5. e(0) = W −1 H−1 W −1 e(0) . Example 5.m. These ﬁgures are generated from the program fracdelay. ˆ (5. the input signal is multiplied with the windowed sinc and interpolated to ﬁnd the signal delayed by fractional amount. 2 . In this case. . The error target for the SCB is the error target for the ACB subtracted from the ﬁltered codeword obtained after the ACB search.11) e(0) = W(s − s(0) ) − WHu .1 TARGET SIGNAL FOR STOCHASTIC CODEBOOK SEARCH The SCB search is done on the residual signal after the ACB search identiﬁes the periodic component of the speech signal.m) available under the directory P5_5 in the website [76]. is the i th SCB codeword used for comparison.3 The windowed interpolating functions (8 points) for the fractional delays for 1/4 and 3/4 are shown in Figure 5. In the MATLAB implementation.The pitch synthesis routine that generates u from the pitch delay and gain parameters obtained from the ACB search is given in the Program P5. After computing the windowed sinc function. 2 . 4 (fractional part). A signal and its delayed version by a delay of 1/3 are shown in Figure 5. N − 1. The integer and fractional parts of the delay are computed and the optimal excitation vector with the highest match score is found.5 (pitchsynth.The following sections describe the generation of target signals for SCB search and the actual search procedure. Therefore. The modiﬁed target is given by. THE CODEBOOK SEARCH 1 1 3 where j = −N . −N + 1.5). .13) The gain and match scores are also the same as given in Equations (5.12) (5. the modiﬁed ﬁltered codeword for comparison is same as the one given in (5.8).7) and (5. ˜ (5.4 (fracdelay. ˜ which is the same as. the target for the SCB search is. so that the codebook search for the next subframe uses the updated codebook.72 5. The delayed excitation signal is used in the next step to ﬁnd the gain and the match score. 1 .7.m) implements fractional delaying of an input signal and it is available under the directory P5_4 in the website [76]. f = 4 . This excitation vector is used to update the pitch memory. . x (i) .3.8. ˆ e(0) = W −1 H−1 (s − s(0) ) − W −1 u . instead of operating directly with the error target. . the target signal is converted into a form of excitation as done with the ACB search. 5.
3. Figure 5. .8: Input signal and its delayed version by 1/3 sample.7: Windowed sinc function for interpolation.5. STOCHASTIC CODEBOOK SEARCH 73 Figure 5.
This is subtracted from s to obtain the residual s − s(0) − Hu. fractional delay is implemented by the program ldelay. the current excitation in the pitch memory is updated with the Mi samples of the past excitation stored in the pitch memory. Program P5.m is very similar to Program P5. This is performed in Pro˜ gram P5.6: Generating the SCB target vector. the pitch memory is updated by extending the past excitation to 60 samples. Filtering u with 1/A(z) is the same as computing Hu + s(0) . The function pitchvq. This residual is passed ˆ . The integer and fractional parts of the delay are obtained from the pitch delay.6 (scbtarget. After generating u. If the integer part of pitch delay (Mi ) is greater than or equal to the frame size (60 samples).5 and generates ˆ the vector u. this procedure is equivalent to doing pitch synthesis using a longterm predictor with delay M and appropriate gain. THE CODEBOOK SEARCH The variable buf is the memory that contains the past excitation. For M > 60.m).74 5. The array b contains the pitch delay and the pitch predictor coefﬁcients.m. If Mi is less than 60 samples. where s(0) is the ZIR ˆ of 1/A(z). The current ACB excitation is now present in buf. the target vector e(0) needs to be computed. Once the integer delaying is performed. which is scaled by the pitch gain to obtain the scaled ACB excitation vector u and stored in rar.
Figure 5. Example 5.1.9. The .9: The scaled ACB excitation.6 and is shown in Figure 5.13). in (5. STOCHASTIC CODEBOOK SEARCH 75 through the LP analysis ﬁlter A(z) and the inverse weighting ﬁlter 1/A(z) to yield the vector W −1 H−1 (s − s(0) ) − W −1 u. Filtering is performed by convolving the excitation (ex) with the truncated impulse response (h) of the inverse weighting ﬁlter. Program P5.m) computes the gain and matches scores for a particular SCB excitation. if we want to see the actual error target e(0) .5 5. Note that the target is given by e(0) and not e(0) . The SCB error target is generated using Program P5. Therefore. It is available in the directory P5_7 in the website [76]. Details on the structure of the SCB are given in Section 5. For the ﬁrst search alone. the target displayed has the form of ˜ excitation and needs to be ﬁltered with 1/A(z). a full convolution is performed to compute Ycg. The excitation is plotted in Figure 5.3.1 are computed using Program P5.5. which is the same as e(0) .5. the ﬁltered codeword.3. ˆ ˜ Example 5.4 The scaled ACB excitation is generated for the speech segment in Example 5.10. This could be easily determined using the function polefilt.7 (cgain. the error target. This is very similar to the ACB search except that the SCB codebook is ﬁxed and has some structure that could be exploited to perform fast convolutions.2 SEARCH FOR BEST CODEVECTOR The SCB search is done with the modiﬁed error target signal.m. The workspace variable pitsyn contains the necessary pitch delay and gain parameters needed by the program.
THE CODEBOOK SEARCH Figure 5. The gain and match scores are recomputed with the chosen codeword to correct any errors accumulated during the process. the ﬁlter states are updated using a procedure similar to the one followed in Program P5.5. The difference is that the error e(0) is not initialized to a zero ˜ vector but the SCB excitation itself.2. full convolution is performed in the ﬁrst if structure in the program. the best codebook index is 135 and the corresponding quantized gain score is −1330. The gain and match scores are then computed using the ﬁltered codeword.11. Since the subsequent codewords are shifted by −2. The resulting scores are plotted against the indices of the SCB codewords in Figure 5. This process is repeated for all possible codewords.76 5. After determining the SCB excitation. The gain scores are quantized gain values and they are constrained to the limits imposed by the modiﬁed excitation gain scaling procedure [20]. The selected codeword is scaled with the quantized gain in order to get the SCB excitation.6.10: Example of a modiﬁed SCB target signal. The ﬁnal update of ﬁlter states is performed on the synthesis part of the AbyS procedure. In this case. Example 5.6 . the else structure corrects the previous convolution result individually for the two shifts. The gain and the match scores are computed for the SCB target given in Example 5. and the one with the best match score is chosen. This results in the complete update of the states of the ﬁlters given in Figure 5.
5.1. 5. The ACB gains are encoded using 5 bits per subframe.4 BITSTREAM GENERATION The bitstream generation process for the FS1016 encoder consists of packing the quantized LSF values. One bit per frame is used for synchronization. the stochastic codeword and the stochastic gain indices into a bitstream array and adding bit error protection to the bitstream.4. The SCB indices need 9 bits per subframe and the SCB gains need 5 bits per subframe. the pitch delay. BITSTREAM GENERATION 77 Figure 5. The FEC consists of encoding selected bits using a Hamming code. The ACB pitch delay index is encoded with 8 bits for odd subframes and 6 bits for even subframes. 56 bits are used per frame for the SCB. The Hamming code used is a (15.11: Gain and match scores for SCB search in a voiced segment. LSFs are transmitted on only a perframe basis. Though LSFs of subframes are used for codebook searches. and therefore. 4 bits per frame for Forward Error Correction (FEC) and 1 bit is used for future expansion. The total number of bits needed to encode the quantized LSF values per frame is 34. Therefore. Recall that the ACB pitch delay for odd subframes is absolute and the pitch delay for even subframes is relative with respect to the previous odd subframe. the pitch gain.11) code which means that 11 bits are protected using 4 bits to give a total of 15 bits. 20 bits are needed to encode the ACB gain values per frame. The ACB gain has its most perceptually sensitive bit protected. . Hamming codes can detect 2 errors and correct 1 error [77]. The total 144 bits per frame are allocated as shown in Table 5. Only the absolute pitch delays have 3 bits protected by the FEC since coding the absolute pitch delays is crucial for coding the delta delays correctly. This is because the subframe LSFs could be easily recovered from the LSFs of a frame using linear interpolation.
such as the G.1: Bit allocation for a frame. In AMRWB. The reduction in complexity was obtained using backward ﬁltering and algebraic codes such as ReedMuller code and NordstromRobinson code [33]. They use Conjugate StructureVQ (CSVQ) to perform joint quantization of the adaptive and the stochastic excitation gains [49]. . The 11th bit protected by the Hamming code is the future expansion bit. Most of the modern CELP algorithms. 5. Subframe Parameter 1 2 3 4 Frame ACB index 8 6 8 6 28 ACB gain 5 5 5 5 20 SCB index 9 9 9 9 36 SCB gain 5 5 5 5 20 LSF 1 3 LSF 2 4 LSF 3 4 LSF 4 4 LSF 5 Subframes not 4 LSF 6 applicable here 3 LSF 7 3 LSF 8 3 LSF 9 3 LSF 10 3 Future Expansion 1 Hamming Parity 4 Synchronization 1 Total 144 3 bits of the pitch delay index for the odd subframes and 1 bit of the pitch gain for all the subframes are protected using the Hamming code. the pitch and algebraic codebook gains are jointly quantized in the AMRWB [5]. THE CODEBOOK SEARCH Table 5.78 5.5 SUMMARY In this chapter. and the modiﬁed CELP algorithm came to be known as ACELP. multiple bit rates are achieved using algebraic codebooks of different sizes and encoding the LP parameters at different rates. Furthermore. we discussed the structure of the ACB and the SCB and search methods to ﬁnd an optimal codevector. Further improvements in the codebook structure were made with the use of algebraic codes for the SCB.729 and the SMV are categorized as Conjugate Structured ACELP (CSACELP) [4] algorithms.
1 EXTRACTING PARAMETERS FOR SYNTHESIS The CELP synthesis parameters of a particular frame are extracted and decoded from the bitstream as the ﬁrst step in the CELP synthesis process. the ACB gain and the pitch delays are smoothed in case transmission errors are detected.1: The CELP receiver. Also included in the decoded parameters are the LSFs of a frame. and it is also ﬁltered using a highpass ﬁlter.The SCB gain.m Postfilter output speech postfilt. Therefore. This step also involves decoding the LSFs using the . The decoded parameters include the ACB and the SCB gains and indices for each subframe.m Highpass filter output speech celpsyn. 6.79 CHAPTER 6 The FS1016 Decoder The parameters of the encoded bitstream are decoded in the FS1016 CELP receiver.m Generate composite excitation pitchvq.1 and the key functions will be described in the following sections.m Interpolate LSFs for subframes intsynth.m Compute distance measures disto. Composite excitation with both the periodic and stochastic components is generated for the LP synthesis ﬁlter. The block diagram of the CELP receiver is illustrated in Figure 6. Demonstration of the functions will be provided using MATLAB programs and plots. there are three speech outputs from the CELP receiver: raw synthesized (nonpostﬁltered). postﬁltered and highpass ﬁltered speech.m Highpass filtered speech Postfiltered speech Distortion measures Nonpostfiltered speech Figure 6. and the distance measures between input and synthesized speech are computed.m LP synthesis polefilt. The synthesized speech is postﬁltered to enhance the formant structure of speech. Transmitted Bitsream Extract and decode CELP synthesis parameters celpsyn. The raw synthesized speech is scaled and clamped to the 16bit integer range. which are then interpolated to obtain subframe LSFs.
THE FS1016 DECODER quantization table and interpolating them to create subframe LSPs. a syndrome other than zero indicates the position of error and hence is called a bad syndrome whereas zero indicates no error [77]. Subframe LSFs are computed by interpolating the LSFs of the current and the previous frames. decoded and interpolated to create subframe LSFs. In Hamming code. Program P6. Corrections are done similar to the analysis stage to make the LSFs monotonic if they are nonmonotonic. The quantized values of LSFs are then used to create subframe LSFs using linear interpolation as shown in Program P6. The bit error rate is computed as a running average of bad syndromes.1. The LSFs of the particular frame are extracted. The LSFs are decoded by indexing the quantization table using the transmitted indices to create quantized LSFs. . Smoothing of the ACB. SCB gains and the pitch delays is also performed as a part of this process. The codewords are extracted from the received bitstream and the Hamming error protected bitstream is decoded.1: Interpolating LSFs for subframes.80 6. The quantized LSFs are tested for monotonicity again in the synthesis stage.
The mean and variance of the gain values in the vector are estimated. we will illustrate the case of gain smoothing and the smoothing of pitch delay is also done in a similar manner. Table 6.3750 0. and the interpolated LSFs of the subframes are given in Figure 6. . Indicating the LSF of the previous frame as fo and the LSF of the current frame as fn .2 shows the values of LSFs for the current frame. (6. The polezero plots of the LP synthesis ﬁlters corresponding to the LSFs of the previous frame. Example 6.6250 0. If either two errors are detected or the bit error rate is more than permissible. The polezero plot corresponding to the LSF of the ﬁrst subframe is closest to that of the previous frame whereas the polezero plot corresponding to the LSF of the fourth subframe is closest to that of the current frame.1.1) wi. fi.3750 0. the LSFs of the current frame.1250 Current Frame 0. Smoothing is performed on the SCB gain. Subframe Number 1 2 3 4 Previous Frame 0. Gain smoothing is performed only when errors are detected.6250 0. If the variance is within the prescribed limit and the gain value of the current subframe falls outside the permissible range. For the fourth subframe in any frame. the ACB gain and the pitch delay if bit protection is enabled.1250 0. Smoothing is essentially the process of reducing the variations in the received parameters across subframes.6.1.8750 0. the smoothed gain from the past and future subframes will be a good estimate. when the actual gain is not correct.1 An illustration of generating subframe LSFs from the LSFs of a frame is described in this example. Here. EXTRACTING PARAMETERS FOR SYNTHESIS 81 The weights used for the interpolation are given in Table 6.8750 The ACB and the SCB parameters are decoded from the bitstream.2.s = wi. The effect of interpolating the LSFs of the previous and the current frames is clearly visible in the ﬁgure. The subframe LSFs are obtained from the LSFs of the previous frame and the current frame using Program P6. gain smoothing is disabled. the LSFs of the i th subframe are computed as follows. This is done by creating a vector of past and future gain values. the gain of the current subframe is set to the average gain value.n fn . gain smoothing is initiated. This is because. retaining the actual sign. Table 6. previous frame and the four subframes in the subframe analysis buffer.o is the weight corresponding to the i th subframe LSF that multiplies the previous frame LSF and wi.1: Interpolation weights for subframe LSFs.o fo + wi.1.n is the weight corresponding to the i th subframe LSF that multiplies the current frame LSF.
(c) for the subframe 1.2 LP SYNTHESIS The SCB excitation is generated by scaling the selected SCB codeword by the SCB gain value. The composite excitation is generated by passing the SCB excitation through the LTP and Program P5. Program P4. Program P6. This excitation will be used in the LP synthesis ﬁlter to synthesize the speech signal.5 is used to perform this. The LP coefﬁcients and the composite excitation are used in the allpole synthesis ﬁlter to synthesize the speech signal.3).1 is used to convert the subframe LSFs to the corresponding LP coefﬁcients. (b) for the current frame. The synthesized speech is scaled and clamped to the 16bit range.2 generates the raw synthesized (nonpostﬁltered) speech from the LSFs of a subframe and the excitation.2: Polezero plots from LSFs (a) for the previous frame.82 6.5 and P4. (d) for the subframe 2.1 in order to generate the composite excitation and convert the LSFs to directform LP coefﬁcients. 6. (e) for the subframe 3 and (f ) for the subframe 4. . THE FS1016 DECODER Figure 6. It internally uses the Programs P5. The LP synthesis ﬁlter is an allpole ﬁlter and the denominator polynomial is computed by converting the subframe LSFs to the coefﬁcients of the LP polynomial A(z) using (3.
4363 0. Figure 6.0600 0.4288 0.1230 0.1105 0.4288 0.1051 0.4288 0.3254 0.2.1988 0.1333 0.3100 0.1298 0.0638 0.2473 0.1. LP SYNTHESIS 83 Table 6.3: Composite excitation with both periodic and stochastic components.0298 0.3100 0.2314 0.3100 0.3100 0.0333 0.1916 0.0663 0.0612 0.0700 0. Frame LSF Subframe LSF Old New 1 2 3 4 0. .1484 0.0213 0.4. The excitation given in Figure 6.1412 0.3176 0.1772 0.2288 0.3156 0.4363 0.2 The composite excitation is generated and LSFs are converted to LP coefﬁcients for subframe 3 of Example 6.0996 0.2500 0.1213 0.1187 0.2367 0.1264 0.3100 0.4363 0.0688 0.3293 0.2: Interpolating LSFs of a frame to create subframe LSFs.1160 0.0350 0.0969 0.2420 0.3312 0.3 and LP coefﬁcients are used to synthesize speech given in Figure 6.4288 0.1350 0.6.4363 0.1628 0.4288 0.0230 0.0264 0.4288 0.3215 0.3100 0.4363 0.4363 Example 6.
6. The major purpose for postﬁltering is to attenuate the noise components in the spectral valleys. Noise levels are kept as low as possible in the formant regions during encoding as it is quite difﬁcult to force noise below the masking threshold at all frequencies.Therefore. But.2: Synthesizing speech from excitation and LSF.This is because the LP synthesis ﬁlter has a lowpass spectral tilt for voiced speech. THE FS1016 DECODER Program P6. which was not attenuated during encoding. the spectral tilt of the postﬁlter must be compensated to . Postﬁlters can be generally designed by moving the poles of the allpole LP synthesis ﬁlter thereby resulting in ﬁlters of the form 1/A(z/α).84 6.3 POSTFILTERING OUTPUT SPEECH Postﬁltering of output speech is generally performed in order to improve the perceptual quality of the output speech [47]. using this ﬁlter to achieve sufﬁcient noise reduction will result in severe mufﬂing of speech [47].
4: Synthesized (nonpostﬁltered) speech using the composite excitation and LP coefﬁcients.6.3. POSTFILTERING OUTPUT SPEECH 85 Figure 6. . 15 1/A(z/0 5) 1/A(z/0 8) A(z/0 5)/A(z/0 8) 10 Magnitude (dB) 5 0 5 10 0 01 02 03 04 05 06 07 08 Norma ized Frequency (x π rad/samp e) 09 1 Figure 6.5: Frequency response of the polezero postﬁlter.
5 gives the spectral tilt alone and α = 0. THE FS1016 DECODER avoid mufﬂing. 12 8 Magnitude (dB) 4 0 4 8 12 0 01 Compensated PZ postfilter PZ postfilter 02 03 04 05 06 07 08 Normalized Frequency (x π rad/sample) 09 1 Figure 6. An allzero ﬁlter of the form 1 + μa1 (1)z−1 with μ = 0.8 gives the ﬁlter with both the spectral tilt and the formant structure. The response of this ﬁlter and the overall response of the polezero postﬁlter with adaptive spectral tilt compensation are given in Figure 6. The ﬁrst reﬂection coefﬁcient is the LP coefﬁcient a1 (1) of the ﬁrstorder polynomial that ﬁts the spectrum.5. A(z) is computed for a voiced frame.6: Polezero postﬁlter with and without adaptive spectral tilt compensation. which is the polezero postﬁlter where the spectral tilt is partially removed and is also shown in Figure 6.5 is used for the adaptive compensation of spectral tilt. the spectral tilt can be removed using the ﬁlter of form A(z/0.8). This is done by computing the reﬂection coefﬁcients of the denominator polynomial. Figure 6.86 6. The power of the synthesized speech before postﬁltering (input power) and the after postﬁltering (output power) are computed by ﬁltering the squared signal using an exponential ﬁlter. Hs (z) = τ .6. Exponential ﬁlters are generally used in time series analysis for smoothing the signals and are described by the transfer function. Therefore.5 illustrates the frequency response magnitude of 1/A(z/α) for the case of α = 0.2) .5 and α = 0.8. 1 − (1 − τ )z−1 (6.5)/A(z/0. Further compensation for the lowfrequency tilt and mufﬂing can be provided by estimating the spectral tilt (ﬁrstorder ﬁt) of the postﬁlter denominator polynomial. and it is instructive to note that α = 0.
3. POSTFILTERING OUTPUT SPEECH 87 0 01 0 008 Amplitude 0 006 0 004 0 002 0 100 200 300 400 500 600 Samples 700 800 900 1000 Figure 6. 0 5 10 15 Amplitude (dB) 20 25 30 35 40 45 50 0 01 02 03 04 05 06 07 Normalized Freq (x π rad/s) 08 09 1 Figure 6.8: Magnitude frequency response of the exponential smoothing ﬁlter. .6.7: Impulse response of the exponential smoothing ﬁlter.
For this particular subframe.1. The output speech is then scaled using the scaling factor.8. The extracted parameters are then used to generate a . S= Punf .10 shows the plot of scaling factors from the AGC module for the speech subframe. the magnitude response of the exponential ﬁlter is also a decaying exponential function. Pout (6. Note that since the ﬁlter is IIR. only the truncated impulse response is shown. 6. The purpose of the AGC is to scale the output speech such that its power is roughly the same as unﬁltered noisy output speech. Figure 6. respectively.The impulse and frequency responses of the exponential ﬁlter with τ = 0. THE FS1016 DECODER where τ is the parameter that controls the degree of smoothing. respectively. The ﬁlter memories of the pole and zero ﬁlters are also taken into consideration during postﬁltering. Postﬁltering is illustrated for the subframe 3 of the speech segment given in Example 6. Allzero ﬁltering is also performed to provide further compensation to spectral tilt. The input and output power estimates are computed using exponential ﬁlters taking into account the memory of the ﬁlters as well.5 SUMMARY The FS1016 CELP decoder synthesizes output speech from the transmitted parameters. The smoothed average estimates of the input and the output power of a subframe are used for Automatic Gain Control (AGC). Because of the properties of the Fourier transform. The LSFs of the frame are decoded and interpolated to form subframe LSFs and the ACB and SCB parameters extracted from the bitstream.88 6. Example 6. The MATLAB code for adaptive postﬁltering and AGC is given in Program P6.3 6. The negative of the ﬁrst reﬂection coefﬁcient is taken as a1 (1) because of the sign convention used in the program. the scaling factors are less than 1 and the postﬁltered speech has to be scaled down. Figure 6.9 shows the plot of the postﬁltered and scaled output speech subframe.01 are shown in Figures 6.3. but the autocorrelation lags of the input and synthesized speech segments need to be computed before calculating the distance measures. The two different values of α used in the polezero postﬁlter are stored in the variables alpha and beta in Program P6.3) where Punf and Pout are the unﬁltered speech power and output speech power. smaller the parameter τ .7 and 6.4 COMPUTING DISTANCE MEASURES The distance measures are computed between the actual input speech in the subframe buffer and the synthesized speech segment (nonpostﬁltered). The scaling factor is computed as.3. The procedure for computing the distance measures is exactly same as the one described in Chapter 4. greater the degree of smoothing.
6. SUMMARY 89 Program P6. (Continues.3: Postﬁltering output speech.5.) .
12000 8000 Amplitude 4000 0 4000 8000 5 10 15 20 25 30 35 Samples 40 45 50 55 60 Figure 6. the CELP encoder contains a replica of the CELP decoder except for the postﬁlter.90 6. .) Postﬁltering output speech. The FS1016 speech coder provides a good compromise between speech quality and bit rate. and the subjective quality of speech is maintained in the presence of moderate background noise [20]. composite excitation which is used along with the LP parameters to synthesize speech on a subframe basis.9: Postﬁltered and scaled output speech segment. THE FS1016 DECODER Program P6. The Selectable Mode Vocoder (SMV) classiﬁes input speech into different categories and selects the best rate in order to ensure speech quality at a given operating mode [78]. Since CELP uses an AbyS procedure. Modern day speech coders have incorporated variable bit rate coding schemes to optimize speech quality under different channel noise conditions [5. 42].3: (Continued.
SUMMARY 91 0 95 0 94 0 93 Scaling factor 0 92 0 91 09 0 89 5 10 15 20 25 30 35 Samples 40 45 50 55 60 Figure 6. .6.10: Scaling factor for postﬁltered speech provided by AGC.5.
93
Bibliography
[1] M.R. Schroeder and B. Atal,“CodeExcited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,” Proc. ICASSP85, p. 937, Apr. 1985. 7, 9, 63 [2] S. Singhal and B. Atal, “Improving the Performance of MultiPulse Coders at Low Bit Rates,” Proc. ICASSP84, p. 1.3.1, 1984. 7, 63 [3] B.S. Atal and J. Remde, “A New Model for LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates,” Proc. IEEE ICASSP82, pp. 614–617, Apr. 1982. 7 [4] R. Salami et al., “Design and Description of CSACELP: A Toll Quality 8 kb/s Speech Coder,” IEEE Trans. on Speech and Audio Proc., vol. 6, no. 2, pp. 116–130, Mar. 1998. DOI: 10.1109/89.661471 7, 10, 78 [5] B. Bessette et al., “The Adaptive MultiRate Wideband Speech Codec (AMRWB),” IEEE Trans. on Speech and Audio Proc., vol. 10, no. 8, pp. 620–636, Nov. 2002. DOI: 10.1109/TSA.2002.804299 7, 11, 49, 78, 90 [6] M.R. Schroeder, B.S. Atal, and J.L. Hall, “Optimising Digital Speech Coders by Exploiting Masking Properties of the Human Ear,” J. Acoust. Soc. Am., vol. 66, 1979. DOI: 10.1121/1.383662 7 [7] P. Kroon, E. Deprettere, and R.J. Sluyeter, “RegularPulse ExcitationA Novel Approach to Effective and Efﬁcient Multipulse Coding of Speech,” IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 34, no. 5, pp. 1054–1063, Oct. 1986. DOI: 10.1109/TASSP.1986.1164946 7 [8] I. Boyd and C. Southcott, “A Speech Codec for the Skyphone Service,” Br. Telecom Technical J., vol. 6(2), pp. 51–55, Apr. 1988. DOI: 10.1007/s1055000700700 7 [9] GSM 06.10, “GSM FullRate Transcoding,” Technical Report Version 3.2, ETSI/GSM, Jul. 1989. 7 [10] B.S. Atal, “Predictive Coding of Speech at Low Bit Rates,” IEEE Trans. on Comm., vol. 30, no. 4, p. 600, Apr. 1982. 7 [11] B. Atal and M.R. Schroeder, “Stochastic Coding of Speech Signals At Very Low Bit Rates,” Proc. Int. Conf. Comm., pp. 1610–1613, May 1984. 7
94
BIBLIOGRAPHY
[12] G. Davidson and A. Gersho,“Complexity Reduction Methods for Vector Excitation Coding,” Proc. IEEE ICASSP86, p. 3055, 1986. 9 [13] W.B. Kleijn et al., “Fast Methods for the CELP Speech Coding Algorithm,” IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 38, no. 8, pp. 1330, Aug. 1990. DOI: 10.1109/29.57568 9 [14] I. Trancoso and B. Atal, “Efﬁcient Search Procedures for Selecting the Optimum Innovation in Stochastic Coders,” IEEE Trans. ASSP38(3), p. 385, Mar. 1990. DOI: 10.1109/29.106858 9 [15] I. Gerson and M.A. Jasiuk, “Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 kb/s,” Proc. IEEE ICASSP90, vol. 1, pp. 461–464, Apr. 1990. DOI: 10.1109/ICASSP.1990.115749 9, 10 [16] I. Gerson and M. Jasiuk, “Techniques for Improving the Performance of CELPtype speech coders,” Proc. IEEE ICASSP91, pp. 205–208, May 1991. DOI: 10.1109/49.138990 9 [17] P. Kroon and B. Atal,“Pitch Predictors with High Temporal Resolution,” Proc. IEEE ICASSP90, pp. 661–664, Apr. 1990. DOI: 10.1109/ICASSP.1990.115832 9, 71 [18] W.B. Kleijn, “SourceDependent Channel Coding and its Application to CELP,” Advances in Speech Coding, Eds. B. Atal, V. Cuperman, and A. Gersho, pp. 257–266, Kluwer Ac. Publ., 1990. 9, 12, 65 [19] A. Spanias, M. Deisher, P. Loizou and G. Lim, “FixedPoint Implementation of the VSELP algorithm,” ASUTRC Technical Report, TRCSPASP9201, Jul. 1992. 9 [20] J.P. Campbell Jr., T.E. Tremain and V.C. Welch, “The Federal Standard 1016 4800 bps CELP Voice Coder,” Digital Signal Processing, Academic Press, Vol. 1, No. 3, p. 145–155, 1991. 10, 64, 65, 66, 76, 90 [21] Federal Standard 1016, Telecommunications: Analog to Digital Conversion of Radio Voice by 4800 bit/second Code Excited Linear Prediction (CELP), National Communication System Ofﬁce Technology and Standards, Feb. 1991. 10, 12 [22] TIA/EIAPN 2398 (IS54), “The 8 kbit/s VSELP Algorithm,” 1989. 10 [23] GSM 06.60, “GSM Digital Cellular Communication Standards: Enhanced FullRate Transcoding,” ETSI/GSM, 1996. 10 [24] GSM 06.20, “GSM Digital Cellular Communication Standards: Half Rate Speech; Half Rate Speech Transcoding,” ETSI/GSM, 1996. 10 [25] ITU Draft Recommendation G.728, “Coding of Speech at 16 kbit/s using LowDelay Code Excited Linear Prediction (LDCELP),” 1992. 10
“Evolution of Wireless Data Services: IS95 to CDMA2000. Johansson. Mag. 1990.138988 10 [27] TIA/EIA/IS96. “A LowDelay CELP Coder for the CCITT 16 kb/s Speech Coding Standard. on Sel.722150 11 [37] ETSI AMR Qualiﬁcation Phase Documentation. Apr. TIA.“QCELP. DOI: 10. on Comm. 177–180.BIBLIOGRAPHY 95 [26] J.” IEEE Comm. 709–712. S. 117–119. 2. 1. 1997.. 11 [38] R. 10 [30] TIA/EIA/IS127. no. pp. Jun. vol. 1987. 38. “On Reducing Computational Complexity of Codebook Search in CELP Coder through the Use of Algebraic Codes. no. May 2001. 11. 11 [39] Y. 140–149. 830–849. Jun.” Speech Service Option 3 for Wideband Spread Spectrum Digital Systems. Knisely.” IEEE Trans.P. I. “Dual Rate Speech Coder for Multimedia Communications transmitting at 5. 10.729. “Fast CELP Coding Based on Algebraic Codes. 1. and M.” Proc. “On Reducing Computational Complexity of Codebook Search in CELP Coding. Laha. pp. “Enhanced Variable Rate Codec.1109/ICASSP. 10 [33] J. Kleijn et al. 10.” TIA 1996. pp. Mar.1109/49. 1990.” Speech Service Option 3 for Wideband Spread Spectrum Digital Systems.1109/ICASSP.” Proc. pp.3 kb/s. Hagen. IEEE ICASSP92.. 1999.” Draft 1995. IEEE ICASSP90. “The SMV Algorithm Selected by TIA and 3GPP2 for CDMA Applications. 1992. S. Nov.N. DOI: 10. Gao et al. 12. Svedburg. Navda. IEEE ICASSP01.1. 10 [29] TIA/EIA/IS641. Jayant. DOI: 10. and J.Enhanced FullRate Speech Codec. “Coding of Speech at 8kb/s using ConjugateStructure AlgebraicCodeExcited LinearPrediction (CSACELP). 1957–1960.. pp. Lee et al.. Laﬂamme et al. vol. Chen.2001. Melchner. vol. Adoul et al.” 1995. pp. pp.1992. Kumar. R.” IEEE Trans. Y. 1935–1937. 78 [34] J.225903 10 [32] ITU Study Group 15 Draft Recommendation G.115567 10 [36] D. and S.“Cellular/PCS Radio Interface .. “Generalized AnalysisbySynthesis Coding and its Application to Pitch Prediction. Oct.1990. Apr. vol.. DOI: 10. 337–340. Areas in Comm.. TIA 1992.1109/26. Cox.. vol.1109/ICASSP.941013 11 . 36.I.B.” Proc.3 and 6.723. Ekudden. R. vol. 10 [31] W. Lin. 1998. 1998.” Proc. 10 [28] ITU Recommendation G.” Proc.61473 10 [35] C. pp. 1992. 5. DOI: 10. vol. IEEE ICASSP87. “The Adaptive MultiRate speech coder. IEEE Workshop on Speech Coding. N.1109/35. DOI: 10.
63.9792 28 .” Proc. 4. 82. DOI: 10. DOI: 10. 1167–1179. Apr. “Selectable Mode Vocoder. Jelinek and R. T. vol. 1987. 21 [53] P. “EXCELP: A Speech Coding Paradigm. Lefebvre.1975. A.P. 200. S. “Extended AMRWB for HighQuality Audio on Mobile Devices. 2001. Vaidyanathan. Prentice Hall.941008 11 [41] TIA/EIA/IS893. “Real Time Vector APC Speech Coding at 4800 bps with Adaptive Postﬁltering. vol. 1975. pp. K. 11 [42] M. 2003. Voice and Speech Processing. DOI: 10.L. Kemp et al.0. PrenticeHall.. Spanias. 4. 13.1109/TASL. Salami. Spanias. Digital Spectral Analysis with Applications. Oct. “An Evaluation of 4800 bits/s Voice Coders. The Theory of Linear Prediction. 53 [51] S. p. 5.” IEEE Trans.1109/PROC. 2185–2188. 12 [46] D. Chen and A. Parsons.” Proc.” Proc. “Wideband Speech Coding Advances in VMRWB Standard. 12 [45] D. Kabal. Apr. DOI: 10. 1. Bruhn and A.” Proc.1109/ICASSP. R. pp. Oct.” IEEE Transactions on Communications. 90 [43] R. Lin.. 2. New Jersey. Dec. ver. Makhoul. “ITUT G.96 BIBLIOGRAPHY [40] Y. 26.” Proc. 2. 2008. 44. May 2006. ICASSP89.2006. 1987. 561–580. 13 [49] A.2200/S00086ED1V01Y200712SPR003 27 [55] J. vol. 49. pp. vol. New Jersey.729. “Linear Prediction: A Tutorial Review. Morgan & Claypool. 10. Taleb. 1541–1582. 1996. 15. 1994. Adaptive Filter Theory. Haykin. IEEE. p. Speech and Language Processing. May 2001. 23 [54] P. Gao et al. IEEE ICASSP01. 1987. ICASSP87.2001. Atti. 18. 1986. IEEE ICASSP. S. 78 [50] S. pp. 2007.1109/MCOM. WileyInterscience. pp.” Proc. I824I827. Proust and H. iss. Taddei. Kontola. Varga. Vol. DOI: 10. “Speech Coding: a Tutorial Review.1637952 12 [44] I. Marple. pp. Gersho. 1989. 445.894514 11.” IEEE Communications Magazine. Lakaniemi. 12. May 2007. Painter and V. 21 [52] T. 12. McGrawHill. 2009.” Service Option for Wideband Spread Spectrum Communications Systems. No. 65 [47] J. EUPISCO86. on Audio. Salami. “New Approaches to Stochastic Coding of Speech Sources at Very Low Bit Rates. Vol. no.” Proceedings of the IEEE. 84 [48] A.2007.1 Scalable Codec for New Wideband Services. No. “IllConditioning and Bandwidth Expansion in Linear Prediction of Speech. Audio Signal Processing and Coding. New Jersey. 689–692.
Laroia.” IEEE Trans. 28 [57] T. IEEE ICASSP. Collura and T.1989. Fransen. vol. ASSP34. 244–247. DOI: 10.” Proc. “Vector Quantizer Design for the Coding of LSF Parameters. Ramachandran.” IEEE Trans. on Speech and Audio Processing.” Proc. pp. IEEE ICASSP. 29 [62] F. New York. 28 [59] A. 1993. “The Computation of Line Spectral Frequencies using Chebyshev Polynomials. ASSP23. “Quantization Properties of Transmission Parameters in Linear Predictive Systems. Viswanathan and J. Sugamura and F. Bistritz and S. 3–14. Itakura. Kang and L.R. 1980. IEICE. ASSP28. 49. DOI: 10. May 1989. pp.4.1–1. 1970.10. Box and G. 609–615.10. 28 [61] N. Linear Prediction of Speech. Mar. 6. Jun. Kabal and R. Makhoul.1109/ICASSP.” Speech Technology. Jr. pp. 9–12.” IEEE Trans. 1998. 168–171.S.266390 49 [67] K. 1976. 6. J64.” Proc. Wong. 49 [65] J. Fischer. 1981. “Vector Quantization of Speech Line Spectrum Pair Parameters and Reﬂection Coefﬁcients. HoldenDay. pp. Jenkins. 29–32. 599–606. Apr. “Application of LineSpectrum Pairs to LowBitRate Speech Encoders. vol. 1419–1426. pp.1109/ICASSP. 29 [63] P. IEEE ICASSP.1109/89.319215 49 . No.K.” IEEE Trans. Soong and BH Juang. 1975.1109/89. Dec.” IEEE Trans. 1. “Immittance Spectral Pairs (ISP) for Speech Encoding. No. iss. Farvardin and R. IEEE ICASSP. Apr. 1993. vol.” Proc.K. Apr.” Proc. “Efﬁcient Encoding of Speech LSP Parameters using the Discrete Cosine Transformation.1993. “Line Spectrum Pair (LSP) and Speech Data Compression. Inc. DOI: 10. 309–321. Jan.” Trans. DOI: 10. San Francisco. 1982. vol. 1985. Gray. 6.BIBLIOGRAPHY 97 [56] J. 1984. SpringerVerlag. Tremain. 1. 52 [64] G. pp.. pp. “The Burg Algorithm for LPC Speech Analysis/Synthesis. 2. Dec. vol. pp. pp. Peller. 1993. 28 [58] G. 2. Paliwal and B. Atal. pp. “Efﬁcient Vector Quantization of LPC Parameters at 24 bits/frame. 40–49.S. 28 [60] V. Gray and D. iss.661470 49 [69] Y. 2. 1. IEEE ICASSP. 1. vol. 10.221363 49 [68] J. Tremain. Mar. “The Government Standard Linear Predictive Coding Algorithm: LPC10.E. Pan and T. “Speech Data Compression by LSP Analysis/Synthesis Technique. 1986. Time Series Analysis Forecasting and Control. 49 [66] N. on Speech and Audio Processing. Vol..E. pp. Markel and A.
1. pp. 1968. Oct. “Vector Quantization: A Pattern Matching Technique for Speech Coding. Gray.C.” Proc. 57.1109/TASSP. May 2001. DOI: 10.1109/PROC. 64 [76] Website for the book.1968. “Minimum Prediction Residual Principle Applied to Speech Recognition. and J. Feb. 1976. 21.morganclaypool. Oppenheim. “Distance Measures for Speech Processing.1162641 60 [72] A. 2.” Proc. vol. vol. Schafer and T. 1982. 63 [75] M.6570 60 [73] A. Cuperman. 23.com/page/ fs1016 60. 70. Dec. Globecom82. E6. vol. Available online at: http://www. Costello. New Jersey. Itakura.” Proc. Vol. ASSP. “Standardization of the Selectable Mode Vocoder. 24. Gray. Vol. 68. 2004. Gray Jr. DOI: 10. 1975. 60 [71] F. pp.2001. Sabin and R. ASSP.” IEEE Com.5.W.C... 1984. Lin and D. “Vector Quantization. 1983. 72. Mag. 63 [74] R.98 BIBLIOGRAPHY [70] A. pp.1975. 75 [77] S. Gersho and V. Apr.” IEEE Trans. DeJaco. IEEE ICASSP. pp. Markel. p.1. vol. 67–12. 953–956. 77. 380–391. Prentice Hall. 80 [78] S. Aug. 15. “Nonlinear Filtering of Multiplied and Convolved Signals.1109/ICASSP. Greer and A. p. R. Error Control Coding. p.” IEEE Trans. Stockham. Jr. “Product Code Vector Quantizers for Speech Waveform Coding. DOI: 10.J. IEEE.. 56.941074 90 . 1264–1291. 4.V.” ASSP Mag.
He served as Associate Editor of the IEEE Transactions on Signal Processing and as General Cochair of IEEE ICASSP99. He also works in signal processing and spectral analysis for Earth and geology systems and has created several functions in JavaDSP software for the same. His research interests include DSP. Computer. RAMAMURTHY Karthikeyan N. . He has worked extensively with the CELP algorithm and has created MATLAB modules for the various functions in the algorithm. speech processing.99 Authors’ Biographies KARTHIKEYAN N. He is author of two text books: Audio Processing and Coding by Wiley and DSP: An Interactive Approach. Andreas Spanias is corecipient of the 2002 IEEE Donald G. Fink paper prize award and was elected Fellow of the IEEE in 2003. speech processing. He also served as the IEEE Signal Processing VicePresident for Conferences. and audio sensing. and Energy Engineering at Arizona State University. He served as Distinguished lecturer for the IEEE Signal processing society in 2004. Ramamurthy completed his MS in Electrical Engineering and is currently a PhD student in the School of Electrical. He is also the founder and director of the SenSIP industry consortium. and Energy Engineering at Arizona State University (ASU). sparse representations and compressive sensing. His research interests are in the areas of adaptive signal processing. ANDREAS SPANIAS Andreas Spanias is a Professor in the School of Electrical. He and his student team developed the computer simulation software JavaDSP ( JDSP – ISBN 0972498400). Computer.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.