You are on page 1of 4

IMMITTANCE SPECTRAL PAIRS (ISP) FOR SPEECH ENCODING

Yuval Bistritz and Shlomo Peller


Department of Electrical Engineering, Tel Aviv University, Tel Aviv 69978, ISRAEL

ABSTRACT or admittance) and obtain alternative (and more effi-


Immittance Spectral Pairs (ISP) form a new set of pa- cient) immittance variable versions to these algorithms
rameters proposed for presenting the LPC filter. ISP is ~ 3 1 .
close to LSP and owns similar ‘niceties’, yet it is differ- If the model for the vocal tract is reformulated into
ent. For a filter of order n it consists of a ‘gain’ and n-1 immittance variables the input immittance of the model
‘frequency’parameters instead of n ‘frequency’param- (the immittance a t the glottis) becomes [4],
eters for LSP. In regarding LPC as a pseudo model for
the vocal tract ISP may represent the immittance a t (3)
the glottis, without posing, like LSP, artificial bound-
ary conditions. In quantization experiments, ISP has This immittance function is supported by the fol-
been found to compare favorably with LSP. A study lowing theorem [5].
of interframe differentiation coding for ISP and LSP
is included and it demonstrates the respectives perfor-
Theorem 1 (ISP).A p ( z ) is stable if and only if
mances of the two sets.
Zp(z) (3) can be written, when p = 2m as

1. Immittance Spectral Pairs


The LPC filter l / A p ( z )is realizable by a cascade of p
lattice sections that implement the following Levinson andwhenp = 2m + 1as (41
recursions for n = 1,.. . , p , with Afj(z) := z-”A,,(z-’> K(1 - 2-1) nEl(l - 22-1zzi + 2-2)
=2m+1(2) = (1+ 2-1) nEl(l- 22-1zzi-1 + 2-2) .
A,,(z) = A , - l ( z ) - k,,z-lA:-l(a) (la)
v i t h p pammeters that am real and satisfy
A i ( z ) = -k,,A,,-l(z) +~ - l A t \ - ~ ( z )
(lb)
- 1 < xp-1 < ... < 5 2 < 5 1 < 1 , K > 0 (5)
This scheme can also be understood as a pseudo model
for the vocal tract that consist of p lossless sections
of different immittances counted from the lips to the The Immittance Spectral Pairs (ISP) parameters
glottis with A,,, A i and k,, representing, respectively, for the LPC polynomial A p ( z )are essentially the p pa-
wave moving forward, backward in section n and the rameters that this theorem defines. The actual param-
amount of A,, that is reflected at the boundary (e.g. eters for coding purposes may be
PI).
It is possible, however, t o replace in the Levinson
and other related algorithms the forward and backward
variables, the so called “scattering” variables by their in which case the necessary and sufficient conditions for
sums and differences stability become

Fn(2) = A,(4+Ai(z); Gn(2) = A,(z)-A11(4 (2) 0 < w1 < w2 < .. . < wp-l < 7r, and - 1 < kp < 1 (7)
called “immittance” variables (as they may represent The parameters wi represent (angular) frequencies (they
the wave field variables sound pressure and volume v e may be replaced by normalized frequencies f, = wi/27r
locity in the current context, whose ratios are impedance used in the reported experiment below or by actual fre
This Work was partly supported by the UNITED STATES quencies). They originate from the conjugate pairs of
-ISRAEL BSF under Grant No. 88-425. unit circle zeros ezp(fjwi) of Fp(z) and Gp(z).

11-9
0-7803-0946-4/93$3.00 1993 IEEE
@
2. ISP versus LSP constrains the 'gain' in (4) to a fixed value (K = 1)
and instead, a p t h 'frequency' parameter completes
Since Line Spectra Pairs (LSP) has been proposed for the characterization of Ap(z).
speech analysis [6, 7, 8, 91 it has been studied by many The way we amved to Theorem 2 offers a new mod-
researchers and has become the most common set of eling interpretation to LSP. Namely, L p ( z )of (9) may
parameters in quantization and encoding algorithms. be viewed as the immittance at the 'glottis' in the VD-
Some of the more recent developments being reported cal tract model when the termination at the 'lips' is
in [lo, 11, 12, 131. constrained by kp+l = 0 [4].
The LSP represents the LPC filter by the unit circle Our experimental study of the properties of ISP pa-
zeros of the two polynomials rameters for encoding LPC has been motivated by the
P ( z ) = Ap(z) - z-lA!(z); Q ( z ) = Ap(z) + z-lA!(z) (8) advantage that ISP has over LSP in being free from
any augmentation to the LPC polynomial from a math-
Usually, the two LSP polynomial are viewd as r e p r s ematical standpoint, or artificial boundary conditions
senting two 'snapshots' of the filter at two separate and in the 'vocal tract model'. We were also encouraged
artificial situations neither of which may represent a by the fact that any quantization gain that these ad-
state along any point on the model. Indeed, the two vantages may yield would be without sacrificing any
LSP polynomials in (8) correspond to extending the of the properties, like control over stability, ordering
recursions (1) with a p+l-th reflection coefficient set and a frequency meaning, that made LSP attractive
once to kp+l = +1 and once to k,+l = -1. for coding.
An alternative interpretation may now emerge by
viewing the ratio of the two LSP polynomials as an im-
mittance function (3) of a polynomial of degree p 1, + 3. Experimental Results
say A,+ 1 ( z ), obtained by augmenting A, ( z ) with a zero
at the origin [4]. First we note that A p + l ( z ) = A p ( z ) In order to learn on the relative merits of LSP and ISP
(the predictor polynomial is by assumption a poly- we ran statistical and quantization experiments under
nomial in z - l with monic free term) and A;+,(z) = identical conditions with each of the two sets of param-
t - ' A $ ( z ) . Next, it is also clear that A p + l ( z ) is stable eters. The data base was derived from the TIMIT CD-
if and only if A,(z) is stable. Consequently, the LSP ROM Speech Corpus and included 24 speakers from 8
theorem [8, 91 follows at once from Theorem 1 applied major dialect regions of the USA (16 male and 8 f e
to this A,+l(z). male). Two different sentences were included for each
speaker, summing up to 48 sentences with duration of
about 4 minutes. The results reported here are based
T h e o r e m 2 (LSP). A p ( z ) is stable and only zf on uniform scalar quantization with inter-frame differ-
A p ( z )- z-'A;(z) entiation under the further conditions summarized in
Lp(2) := Table 1.
Ap(z) + z-'A!(z) (9)
We chose inter-frame differentiation because LSP
may be written when p = 2m - 1 as parameters are more highly correlated inter- frame wise
than intra-frame wise [13]. Fig. 1 and Table 2 describe
the distributions of ISP and LSP parameters. The last
ISP parameters that is of a different type is depicted
separately.
and when p = 2m as (10)
'Pable 1. Experimental conditions.

wzth p parameters that are real a d satisfy


sampling, decimated to 8 kHz,
- 1 < 2, < ... < 2 2 < 2 1 < 1 (11)
Again w, := cos-'(xi) bear a frequency meaning and
the coded parameters may be the normalized frequen-
cies f i = w,/27r. Note that increasing the degree of update rate 50 frames/sec
A p ( z ) with a prescribed zero (at z = 0) adds no in- distortion MSE of Log-spectra
formation so that the new polynomial is, too, deter- quantization inter-frame differentiation
mined by just p parameters. However the modification with error feedback

11-10
loco

MO

0
0 0.5
(c) LTPFREQUENClEs fl0fuaLQD)

Figure 1: Parameters Distribution

Table 2. Parameters Distribution.

Exhaustive iterations were run to find minimal d i s


torsion for bit rates in the interval between 20 bits/
frame and 40 bits/frame. The distortion performance
is affected by both bit location and by quantization
overload [14]. We carried out an elaborate optimiza-
tion scheme to take care of the interaction per each
parameter between the quantizer's overload points and
the number of bits allocated to that parameter. At 20
bits/framc optimal bit allocation was obtained by run-
ning iterations over all the possibilities of transferring
one bit at a time from each parameter to the others, for
a varying quantization intervals, till no further decrease
in distortion was possible. From this initial optimal al-
location, the optimization at each subsequent bit rate
instance involved a bi-dimensional search for the loca-
tion of the new bit and for new quantization intervals
that yield the minimal distortion measure. Fig. 2 com-
pares the distortion per bitrrate of ISP and LSP. ISP
precedes consistently LSP by 1 bit roughly at all dis- LSP 0 35 bits ISP 0 34 bits
torsion lcvcls. To be more specific, wc examine in more
mean 0.972 0.989
detail the 1dB distorsion level, an accepted perceptual
threshold for coding LPC spectral information trans- std. dev. 0.706 0.675
parently. ISP is seen to cross the 1dB threshold at 34 > 3 dB 2.60 % 2.59 %
bits/ frame, and LSP at 35 bitslframc. The distortion > 5 dB 0.44 % 0.42 %

11-11
Y. Bistritz, H. Lev-Ari and T. Kailath, "Immittance -
domain Levinson algorithms", I E E E l h n s . Injorm.
Theory., Vol. IT-35,May 1989. See also Pnx. Int.
ConJ on Acoustics, Speech and Sagnal P m . , pp. 253-
256, Tokyo, Japan, 1986.

Y. Bistritz, H. Lev-Ari and T. Kailath, "Complex-


ity reduced lattice filters for digital speech processing"
P m . Int. Conf. o n Acoustics, Speech and Signal Prvc.,
pp. 21-24, Dallas Texas, 1987.
Y. Bistritz, "Immittance domain modelling for linear
prcdictivc coding" , in preparation.
Y. Bistritz, "A discrete stability equation theorem and
method of stable model rcduction" , Systems d Control
Letters, Vol. 1, pp. 373-381, 1982.
"20 22 24 26 28 30 32 34 36 38 40 F. Itakura, "Line spectrum representation of linear
BlTS I FRAME predictive coefficients", J. Acoust. Soc. A M . , 535(a),
s35(A), 1975.
Figure 2: Rate-Distortion Performance
N. Sugamura and F. Itakura, "Speech data compres-
sion by LSP speech analysis and synthesis technique",
I E C E Pam., Vol. J 64-A, pp. 599-605,1981.

.-U
S. Sagayama, "Stability conditions of LSP spcech syn-
thesis digital filters", P m . OJ ASJ, pp. 153-154, 1982.
F. K. Soong and B-H Juang, "Line Spectrum Pairs
(LSP) and speech data compression", P m . Int. Conf.
on Acoustics, Speech and Signal Prvc., pp. 1.10.1:4,
Oo 5
DISTORTION [dB] 1984.

F. K. Soong and B-H Juang, "Optimal quantization of


LSP parameters" Prvc. Int. Conf. on Acoustics, Speech
Figure 3: Distortion Distributions and Signal P m . , pp. 394-397, 1988.

N. Farvardin and R. Laroia, "Efficient encoding of


4. Conclusions speech LSP parameters using the discrete cosine trans-
formation", Proc. Int. Conf. on Acoustics, Speech and
The ISP paramctcrs are a new sct of paramctcrs pro- Signal Pruc., pp. 168-171, 1989.
posed for spccch encoding. ISP shares with LSP d o
F. K. Soong and B-H Juang, "Optimal quantization
sirable fcaturcs likc stability conditions, ordering and
of LSP parameters using delayed decisions", P m . Int.
frequency meaning. Our initial experimental study in- Conf. on Acoustics, Speech and Signal P m . , pp. 185-
dicates that ISP compares favorably with LSP in all 188, 1990.
thc quantization tests that we performed. This may
suggcst ISP as a substitutc for LSP in many LSP cod- C-C Kuo, F-R Jean and 11-C Wang, "Low bit-rate
ing schemes with thc prospect of offering better perfor- quantization of LSP paramcters using twodimcnsional
mance without an increase in computational complex- diflerential coding", Pnx. Int. Conf. on Acoustics,
ity. In fact, cvcn in computation it has an advantage Speech and Signai P m . , pp. 197-1100, 1992.
margin as it needs one less root computation than LSP. N. S. Jayant and P. Noll, Digital Coding of Wave-
forms, Principles and applications to speech and video,
Prentice-Hall, Inc., Englewood ClifTs, New Jersey,
1984.
References
B. S. Atal, R. V. Cox and P. Kroon, "Spectral quanti-
[l] L. R. Rabiner and R. W. Schafer, Digital Processing of zation and interpolation for CELP coders" Pnx. Int.
Speech Signals, Prcntice- Hall, Englewood Cliffs, New Conf. on Acoustics, Speech and Signal Pm., pp. 69-
Jcrscy, 1978. 72, 1989.

11-12

You might also like