You are on page 1of 6

FAST LEAST-SQUARES (LS) IN THE VOICE ECHO CANCELLATION APPLICATION

Frank K. Soong and Allen M. Peterson

Department of Electrical Engineering


Stanford University
Stanford, CA 94305 USA

synchronous communication satellite where the


ABSTRACT round-trip delay is of 540 ms, an echo suppressor
can disrupt an interactive conversation entirely
The existing echo cancellation methods are due to its slow response to a near-end speech
primarily based on the LMS adaptive algorithm. detection, see e.g., Weinstein [1977]. The long
Despite the fact that the LMS echo canceller works round-trip delay and other problems such as some
better than its predecessor-—the echo suppressor, clipping of the initial part of a speech and its
its performance can be substantially improved if inability to cancel echoes in a double-talking
the Recursive LS (RLS) algorithm is used condition have made us to seek a better solution
instead. However the czp2 operations (p: filter -—the echo cancellation.
order) per sample required prevents the RLS
algorithm from being used in this and many other The idea of echo cancellation is to use a
applications where the filter order is relatively filter as shown in Fig. 2, which is in parallel
high. The computational complexity of the RLS has with the echo return path, to generate a high-
recently been brought down to ap by exploiting the fidelity echo replica. The echo is then cancelled
shifting structure of the signal covarfance by the replica. Since every hybrid transformer is
matrix. Two fast aigorithms, namely the LS shared by many local subscriber loops, the
lattice and the "fast Kalnian", are used here. transfer function of the echo return path varies
Comparisons between the two fast LS algorithms and considerably from one call to the next. An
the LMS gradient algorithm are made and the adaptive structure of the proposed filter is
performance difference is demonstrated. Two therefore necessary.
important problems in voice echo cancellation: the
flat delay estimation and the near-end speech The impulse response of a typical echo return
detection, are approached novelly through a path consists of two major portions; namely a
minimum-mean-squared—error flat delay estimator flat—delay portion and a dispersive portion. The
and a likelihood near-end speech detector. time span of an echo canceller should be long
Simulation results are very satsifactory. enough to cover both the flat delay and the
dispersive portions.
INTRODUCTION
LMS MID IS ECHO cANCELLERS
In a long distance telephone connection as
depicted in Fig. 1, usually unidirectional "4— Most of the existing adaptive echo cancellers
wire" systems are required to connect birectional is based on the Least—Mean-Square (LMS) algorithm
"2—wire" systems. A four—port passive device developed by Widrow [1970] or its derivatives.
called the hybrid transformer is used as an The LMS based adaptive echo canceller is depicted
interface between a 2-wire and a 4-wire system. in Fig. 3 where x(t) is the first speaker's
Associated with each hybrid there is a balancing signal, y(t), the echo, w(t) the channel noise,
impedance to balance the incoming line impedance second speaker's signal.
of a subscriber loop (2-wire system). Since the
present telephone network is constructed in such a In the LMS algorithm, a linear combination of
way that many subscriber loops share only one x(t) is used to form an echo replica, y(t). The
hybrid transformer, it is unlikely that each 2- echo residual is obtained by subtracting the echo
wire line can be perfectly balanced. Therefore replica from the measured echo signal as
some leakage of signals from the "in" to the "out"
port results and returns to the original speaker e(t) =
y(t) - ;() (1)
as an echo. A typical leakage is 15 dB in average
with a 3 dB standard deviation. A long delayed
echo with such a level is intolerable and some
y(t) = (t) x(t) (2)

where 0(t) =
echo suppression or cancellation is required to
attenuate further the echo level.
[50(t)...
x(t) [x(t). . .x(tp+l)]T
The echo suppressor provides a satisfactory
solution to the echo silencing problem for are the canceller coefficient vector and the
moderately delayed (say less than 100 ms) signal vector, respectively.
echoes. But in circuits with long round—trip
delay, e.g., a long-distance call through a

1398

CU 1746-718210000 1398 $ 00.75 © 1982 IEEE


4. The prediction x(t) can be computed for and fast Kalnian algorithms, respectively. The LS
n = 1,...,p from lattice type estimator delays the orthogonal bases
set, namely, the backward prediction errors,
vertically as shown in Fig. 5. When the delay in
x(t) = x1(t) ÷ K1(t-1) rni(t_1)
the vertical direction matches the flat delay, the
xo(t) = 0 echo residual is then optimal in the MMSE sense.
The fast Kalman type estimator delays the gain
5. Compute for n = 0,... ,p-l vectors K(t) instead of the backward errors.
These tw6 estimators, though different in their
+ (1 —
R(t—1) = (t—1) n(tfl r(t-1)
delayed variables, are essentially equivalent.
The proof of the MMSE property can be found in
+ Soong [1981].
n+i(t) = n(t) (1-(t))2r(t—1)ER(t—1)]'
=a OUTLIER NEAR—END SPEECH DETECTION

6. At time t, x(t) is received It is shown in Soong [1981) that the echo


residual e(t) is a zero-mean random variable with
7. For n 0,... ,p—l update
a variance a2(1 + xT(t)(xT(tl)
X(t-1)Y1 x(t))
r(t) = r1(t) -
K_1(t—l) e1(t)
where the measurement noise w(t) is assumea to be
of zero mean, uncorrelated random variable with a
c1(t) = x(t) -
x_1(t)
variance a2 and the signal matrix X(t) is defined

x(t)
ro(t) = rx(o)O
x(0)
8. For n = O,...,p•-l update X(t) =

RC(t) = R(t—1) + (1 —
n(tfl e(t) Lx(t) ... x(t-p+1)
= XAn(t_l) + (1 —
(t}) r(t-l) e(t) The term xT(t) (XT(t_l) X(t-1)) x(t) converges
to zero at a rate proportional to II(t-p).
Therefore the echo residual variance converges to
9. For n = l,...,p update the measurement noise level a2 at a rather fast
rate. As long as there is no near—end speech, the
echo residual remains at the measurement noise
K(t-1) = L(t-1)[R(t-2)i1 level after convergence. When the near-end

y(t) = n_i(t) + K_1(t—1) rni(tl) speaker starts to interrupt, the residual level
increases substantially. This abrupt increase of
- the residual amplitude is a clear indication of
en(t) = y(t) y(t) the on-set of the near-end speech and a likelihood
detector can thus be formed. Once it is detected,
= x4(t-1) + (1 -
cn(t)) r(t-l) en(t) a procedure should be taken to "freeze" or slow
10. Go to 2 down the adaptation to prevent the canceller
coefficients from being diverged by large error
*ISE FLAT DELAY ESTIMATION signals. The likelihood detector is based on the
ratio between the echo residual amplitude and the
Depending on the length of the "tail" of the estimated measurement noise level a. The
4-wire system, the length of the echo return path measurement noise variance can be estimated by a
response varies. The dispersive portion of the one-pole window as
response is usually around 2 ms and the flat—delay
portion can be rather long. For a 15 ms response =x 2(._) 4 (1-x) e2(t) (13)
the adaptive echo canceller should be of 120 taps
(at a 8 kHz sampling frequency) to cover the whole The threshold used to separate the outliers from
range. However, only 16—20 taps are really the good data points is adjustable.
necessary to simulate the dispersive portion of
the response, the rest taps corresponding to the COMPUTER SIMULATION RESULTS
flat delay should all be set to zero not only to
reduce the circuit complexities but to lower the Due to the limit of space we present only
so—called "misadjustment" noise which is due to some of our simulation •results.A mixed—phase
the fluctuation of filter coefficients around ARMA(2,2) model is used to simulate the echo
their true values. However, if we want to turn return path transfer function where y(t) is the
off the "flat-delay" taps to match the generated echo and x(t) is the input signal. An
corresponding flat delay and maintain the rest adjustable delay used to introduce the flat
"dispersive taps" to be adaptive, the flat delay delay. The impulse response of the ARMA(2,2)
has to be estimated first. We propse here a model is depicted in Fig. 7 with 40 samples of
minimum-mean—squared—error (MMSE) based flat-delay flat delay. The input signal x(t) to the echo
estimator. It is of a parallel structure as shown return path is generated by driving a zero mean,
in F1QS. 5 and 6 for the corresoondina LS lattice unit variance Gaussian random sequence through an

1399
The canceller coefficients are updated by the A(t) = A(t-1) - K(t) E(t)
instantaneous gradient of the error square
0(t) = KT(t) x(t)
e2(t) as A
V0e2(t) = -2e(t) x(t) (3)
ae (t) = (1 0(t)) c(t)
-

0(t) = 0(t4) 9e2(t) RE(t) = ?R(t-1) + e(t) (t)


= O(t-1) + 2ie(t) x(t) g(t) =r(RE(t)Y c(t) =L(t) p rows
where is a step—size appropriatelT,' chosen to
insure the stability of the algorithm.
[K(t) + A(t)(Rc(tfl4c(t)j q(t)] 1 row

The LMS gradient algorithm is known as the r(t) x(t-p) + BT(t1) x(t+1)
stochastic approximation realization of the
recursive LS (RLS) algorithm where the norm K(t+1) = L(t) - B(t-1) q(t) (1 - q(t) r(t)Y4

J = Xj e(i) B(t) = B(t-1) — K(t+1) r(t)

is minimized where we use an exponential


K(1) = 0, A(0) 0, 9(0) = 0, Re(O) = SI, (O) = 0
weighting, or a forgetting factor X, to weight out
the past data to track a time-varying echo return where x(t) = [x(t-1). .x(tp)]T
.

path. The RLS algorithm can be easily derived


through the well-known matrix inversion lemma,
see, e.g., Kailath [1980]. It is listed in the The vectors A(t) and B(t) are the forward and
following: backward prediction polynomials of the process
x(t) and c(t) and r(t) are the forward prediction
e = e(t-1) + K(t) [y(t) - T(t) e(t-1)] (6) error and backward prediction error,
respectively. The resulting adaptive echo
K(t) = P(t) (t) (7) canceller is a fixed—order, direct-form FI filter
P(t-1) x(t) Tt P (t-1) which is the same as the LMS gradient canceller in
-

[
P(t) = 4 [P(t) ] (8) its filter configuration as it is shown in Fig. 3,
t-1) x(t) but with a rather different filter gain updating
where P(t) (9) mechanism.

(j) The LS lattice algorithm is another fast LS


alorihm. It computes reflection coefficients
is the covariance matrix inverse and K(t) is the
optimal gain vector. Instead of being an
K ,
Kn the predction errors, s , and their r
coUariances R R , , respectivel9, o the process
approximation as the LMS gradient algorithm, the x(t) from the stae 0 to the stage p. The
RLS is an exact sequential LS solution to minimize orthogonal bases (the backward prediction errors
J. On the other hand, in terms of its r ) are used to span he echo process z(t), and
computational complexity the RLS needs p2 te regression taps K and the echo residuals e0
operations per data sample, which is higher than are successively updaed. The whole procedure is
that of the LMS(c) and is somewhat forbidding for recursive both in time and order. The LS lattice
the echo cancellation application where the filter echo canceller is depicted in Fig. 4.
order, p, is usally around the order of several
tens to a hundred. But if the shift structure of LS Lattice algorithm:
the signal vector x(t), which is not assumed in
the matrix inversion lemma, is exploited, fast 1. Initialization (11)
recursive LS algorithms can be derived and the
computational complexity is reduced R(O) = 51 where S is a positive small const
from w2 to .
. We give the two fast recursive
LS algorithms developed by Ljung, Morf and
Falconer [1978] and Lee and Morf [1978]. R(—1) = SI n = 0,... ,p—l
Fast Gain Computation ("Fast Kalman) Algorithm =0
Instead of updating the covariance matrix r(O) = 0
inverse as the RLS, this algorithm updates the
gain vector K(t) directly through a generalized 2. At time t-1, store
Levinson recursion. At time t, we
have e(t—1), Mt—i), K(t), R(t—1), B(t—1) n =
availble to update the following
R(t-1), A(t-1). R(t-1), rn(t-l).
3. At time t-1, compute for n = 0,... ,p-l
e(t) y(t) - e(tl)T x(t) (10)
K(t-1)
0(t) = e(t-1) + K(t) e(t)

c(t) = x(t) + T(tl) x(t)


K(t-1)

1400
8th order elliptic bandpass filter. At an 8 kHz Digitized Speech as Input
sampling rate the simulated input signal is 300
Hz—3400 Hz bandpassed, which is typical in a One segment of digitized speech was used as
telephone channel. An adjustable amount of random the input and the LMS, the Fast Kalman and the LS
Gaussian noise can be added to the simulated echo lattice algorithms were all tested. The results
y(t) to take into account the channel noise, A—D are depicted in Figs. 11.
quantization noise, and the background noise. A
separate 8th order elliptic bandpass filter driven Outlier Near—End Speech Detector
by a separately generated random Gaussian sequence
is used to generate a near-end 2nd speaker's Two segments of digitized speech were used as
signal. the far—end and near—end speech signals to test
the near—end speech detector and the "freezing" of
Though different in configurations, adaptation. The results are depicted in Fig.
computational complexities as well as numerical 12. Note that the echo was cancelled while the
characteristics, all LS type echo cancellers such 2nd speaker's signal, even the low amplitude
as the RLS, the fast Kalman or the LS lattice are, unvoiced portion was very well preserved.
in principle, equivalent. Therefore, sometimes
the simulation results are presented in only one REFERENCES
type of LS filters. However, the simulations were
done in a 32 bit floating—point arithmetic. In a Duttweiler, D.L. [1978] "A Twelve-Channel Digital
finite arithmetic and digital hardware level , it Echo Canceller", IEEE Trans. Comm. Vol. Corn—
is expected that different LS algorithms will 26, May, pp 647-53.
behave differently.
Kailath, T. [1980] Linear System,Prentice—Hall.
Comparisons Between the LS and the LMS Algorithms Ljurig, L., Morf, M., and Falconer, D. [1978] "Fast
Calculation of Gain Matrices for Recursive
The first experiment is to compare the Estimation Schemes", mt. J. Control, Vol. 27,
convergence rate between the LS type algorithms No. 1, Jan., pp. 1—19.
and the most widely used LMS gradient algorithm in Morf, M. and Lee, D.T.L., [1978] "Fast Algorithms
their echo cancellation performance. for Speech Modelling", Tech. Rept. M303-1,
Information System Lab., Stanford Univ.,
The echo to measurement noise ratio (ENR) is Stanford, CA, December.
40 dB. No flat delay is introduced. 25 Soong, F.K. [1981] "Fast Least— Squares Estimation
statistically independent runs with 1000 samples and Its Applications," Ph. D dissertation,
in each run were conducted and a 30th order filter Dept. of E.E., Stanford University
Weinstein, S.B. [1977] "Echo Cancellation in the
whose coefficients were initiated at zero values
telephone Network", IEEE Comm. Magazine, Jan.,
was used. A power normalized step size was used
pp. 9—15.
for the LMS gradient filter, see Duttweiler [1978)
Widrow, B. [1970] "Adaptive Filters", From Aspects
for details of this choice of . A comparable
forgetting factor Xwhich is 1 — was chosed for of Network and System Theory, N., Holt,
Rinehart and Winston.
the LS lattice. Scatter plots and the ensamble
average of the 25 runs are depicted in Figs. 8 and
9, respectively. The LS lattice converges to the
measurement noise level (-40 dB) around the 200th
samples while the echo residual of the LMS
gradient filter is still at the -30 dB level at
the end of each run.

MMSE Flat-Delay Estimator

We used the filter bank structures shown in


Figs. 5 and 6 to estimate the flat delays. The
ECIC RESIDUAL
flat delay was 40 samples long, the filter length
was 30 and 50 vertical delay stages were used. Fig. 2 An adaptive echo canceller
The error square was estimated as
a-WIRE
250
1 2
Z e.4(t)
t=200
where ed(t) is the dth level prediction error.
The results are given in Fig. 10 for the case of
no measurement noise and the case of -40 dB
measurement noise were experimented. The flat
delay was correctly estimated in both cases. The
two plots are very similar except the -90 dB deep
notch in Fig. 10(a) is masked by a —40 dB
measurement noise in 10(b).

Fig. 1 A long distance telephone connection

1401
y)t) • e(t)

x)t)__

LMS or
FAST KALMAN
It)

Fig. 3 LMS or fast Kalman Fig. 4 LS lattice echo canceller


echo canceller

1.00

60

.60

.40

.20

.00

—.20

—.40

— .60
20 40 60 60 100 120
Scoipi es

Fig. 7 A typical echo return path response


Fig. 5 MMSE flat-delay estimator (LS lattice)

x(t) x(t—1 ) x(t—p+1 ) x(t—p+L+1 )

x(t)H I
F'— x( t) —H
___. x(t-1)___-.

x(t-L) I
-0000
-000
.
.
. ... .

.. .

-9000 -
x(t) —so -— .
400 600 1000

1
1110 IN SNIPLES
e0(t) NB
10 OO[-
o AR PLOTS OF IZATIONS
-00 ( LS Lorotce )
e1(t)
-00 O0

I
e(t)

Fig. 6 MMSE flat-delay estimator :° :L .1.


$00 0000
Ttie tN
(fast Kalnian)
Fig. 8 Scatter plots of 25 runs

1402
BB

ESSETBLE AVERAGE oF 25 REALIGATIUNS

—10.00 ( kB MOnIEioT

-20 00
-30.00

! :.:: BN4CPASVED (APUT ( 105 SB MIVEIc MOVE


-60.02 -110 SB EASUERI1ENT SE

-00.00
—00 00

90 E
000 400 500 500 10(01,

TIME III SpypcEs


SB
10.

0.00 ENsEI'OLO AVERAGE OF 25 AVALIIATIGNS

----—
10.OC, (LSLATTICE)

20 CC
AW\
-S0 00
B

-50.00
BNIDPASSED (spur (110 55 55I441(IC WOVE

-110 OB [EASUREONENT BOISE IS IOU 53

-0
000 400 600 000 1000 (b)
lIMO IN SN.IPLES
Fig. 10 Flat-delay estimator performance
Fig. 9 Ensemble average of 25 runs

2.2
Woo RESIDUAL LoNCE MOFUT
- 1.00
(LB [ERECT Fc> (LS Wuico)
- I

jff.;Nf —1.00
i

[EVE RESIDUAL
5
CASCE
(LS Lorrico) (FAST [Eu) I. 0
0,04 0.0 A L I I I)
1 I
0.30 t.0I
Ii

[coo RESIDUAL

(FAST K500WI)

2.0:
[coo [coo * [coo Pom WISE
1.00
I

-1.00

200 400 600 800 1000


TINE
lIVE

Fig. 12 Near-end speech detector performance


Fig. 11 Digitized speech as input

1403

You might also like