You are on page 1of 5

IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO.

12, DECEMBER 2019 1827

A Neural Network-Based Nonlinear Acoustic


Echo Canceller
Mhd Modar Halimeh , Student Member, IEEE, Christian Huemmer , and Walter Kellermann , Fellow, IEEE

Abstract—In this letter, we introduce a novel approach for non-


linear acoustic echo cancellation. The proposed approach uses
the principle of transfer learning to train a neural network that
approximates the nonlinear function responsible for the nonlinear
distortions and generalizes this network to different acoustic con-
ditions. The topology of the proposed network is inspired by the
conventional adaptive filtering approaches for nonlinear acoustic
echo cancellation. The network is trained to model the nonlinear
distortions using the conventional error backpropagation algo-
rithm. In deployment, and in order to account for any varia-
tion or discrepancy between training and deployment conditions,
only a subset of the network’s parameters is adapted using the
significance-aware elitist resampling particle filter. The proposed
approach is evaluated and verified using synthesized nonlinear Fig. 1. NLAEC scenario.
distortions and real nonlinear distortions recorded by a commercial
mobile phone.
continuously due to the time-varying nature of the acoustic prop-
Index Terms—Nonlinear Acoustic Echo Cancellation, Artificial
Neural Networks, Transfer Learning, Particle Filters.
agation path from the loudspeaker to the microphone [16]–[18].
Alternatively, one can focus on modeling only the nonlinear
loudspeaker signal distortions using a neural network, which is
I. INTRODUCTION challenged by the difficulty of applying inverse filtering tech-
niques for parameter estimation, see, e.g., [19].
HE task of eliminating the acoustic coupling between a
T loudspeaker and a microphone has received significant
attention for decades [1], [2]. Nowadays, this attention is still
In this letter, a neural network-based acoustic echo canceller
is proposed which uses a network topology that is inspired
by conventional NLAEC approaches. The network models the
driven by the increasing need for full-duplex human-machine in- entire nonlinear echo path and is first trained using the error
terfaces in our everyday life. At the same time, such interfaces are backpropagation algorithm. Since the network models the entire
increasingly challenged by nonlinear distortions in miniaturized echo path, which is expected to be time-variant, the principle of
devices such as mobile phones or smart loudspeakers [3], [4]. transfer learning [20] is employed by using the part of the trained
A conventional approach to nonlinear acoustic echo cancel- network corresponding to the time-invariant components of the
lation (NLAEC) is to account for loudspeaker signal distortions echo path and generalizing it to the diverse acoustic conditions
by using a set of nonlinear basis functions [5]–[7], e.g., monomi- expected in testing. During deployment, a subset of the network
als [8], [9], Legendre functions [10], Volterra kernels [11], [12], parameters which corresponds to the time-varying characteris-
functional link expansions [13], or by using kernel methods, tics in the echo path, is adapted online via the Significance-
e.g., as in [14], [15]. An interesting alternative to model the Aware Elitist Resampling Particle Filter (SA-ERPF) [21]–[23]
nonlinear distortions is the use of an artificial neural network in order to account for potential smooth variations in the echo
(ANN), where several concepts are distinguished [16]–[19]: For path.
one, one can either model the entire nonlinear echo path using a
neural network, resulting in a network that needs to be adapted
II. PROBLEM FORMULATION
As depicted in Fig. 1, at time instant n, the far-end signal
Manuscript received July 18, 2019; revised October 18, 2019; ac-
cepted October 27, 2019. Date of publication November 4, 2019; xn = [x[n], x[n − 1], . . ., x[n − Lc + 1]]T is first nonlinearly
date of current version November 20, 2019. This work was sup- distorted by the loudspeaker and possibly other nonlinear com-
ported by the Deutsche Forschungsgemeinschaft (DFG) under contract ponents, resulting in the distorted signal d[n]. The nonlinearly
KE 890/9-1. The associate editor coordinating the review of this manuscript distorted signal d[n] propagates through the acoustic enclo-
and approving it for publication was Prof. Woon-Seng Gan. (Corresponding
author: Mhd Modar Halimeh.)
sure, characterized by the Room Impulse Response (RIR) hn .
M. M. Halimeh and W. Kellermann are with the Chair of Multime- The resulting signal y[n] is captured by the microphone. A
dia Communications and Signal Processing (LMS), University of Erlangen- Wiener-Hammerstein (WH)-type model [24] is used to model
Nuremberg, 91058 Erlangen, Germany (e-mail: mhd.m.halimeh@fau.de; walter. this nonlinear system. The WH model is a cascade of two
kellermann@fau.de).
C. Huemmer is with Siemens Healthineers, 91052 Erlangen, Germany (e-
linear Finite Impulse Response (FIR) filters, ĉ and ĥ, with a
mail: huemmer@lnt.de). memoryless nonlinear preprocessor f (·) in between. The filter
Digital Object Identifier 10.1109/LSP.2019.2951311 ĉ = [ĉ0 , . . ., ĉLc −1 ]T , will be called the prefilter in the following,

1070-9908 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on March 07,2021 at 11:22:42 UTC from IEEE Xplore. Restrictions apply.
1828 IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 12, DECEMBER 2019

weighted by the weight vector ĥn representing the FIR fil-


ter, and the output of the network is again given by (3). Fi-
nally, the ANN uses unbiased neurons that are centered at
zero.
By highlighting the parts of the network that correspond to
WH model components in Fig. 2, a one-to-one relation between
the parameters of the network, except for the vector w, and
those of a WH model is established. The weight vector w is
Fig. 2. The proposed ANN architecture. viewed as an optimization parameter that controls the shape of
the basis functions of the nonlinear preprocessor in Fig. 1. This
is beneficial, since it allows the network to optimally adjust its
is of length Lc and it introduces memory into the nonlinear basis functions to model the nonlinear distortions, as opposed
model. The nonlinear signal d[n] is then estimated by to conventional approaches where a set of basis functions is
pre-selected which is not necessarily optimum.
ˆ = f (ân , ĉT xn ),
d[n] (1) The network’s parameters are optimized by training the net-
work to model the entire echo path, including the linear part,
where ân is a parameter vector that characterizes the function eliminating the need for inverse filtering techniques, which may
f (·). By using ĉ = 1, the WH model becomes equivalent to a be expected to be an additional source of errors.
Hammerstein model for memoryless nonlinearities. After being initialized by random weights sampled from a
The nonlinear preprocessor f (·) consists of a group of M basis standard normal density, the network is trained using the conven-
functions weighted by the vector ân = [â1 , . . ., âM ]T tional error backpropagation algorithm. Intuitively, the network
needs to be trained on conditions similar to the expected condi-
M
 tions during testing, i.e., deployment. Therefore, a representative
f (ân , ĉT xn ) = âi fi (ĉT xn ). (2) selection of volume levels is needed for training to cover a wide
i=1 range of levels during deployment.
The echo signal y[n] is estimated by ŷ[n], which is obtained
ˆ with the FIR B. The Deployment Phase
by convolving the estimated distorted signal d[n]
filter ĥn = [ĥn,1 , . . ., ĥn,Lh ] T By training the network to model the entire echo path, gener-
alization to time-varying acoustic conditions is not guaranteed.
ŷ[n] = ĥTn dˆn , (3) Hence, after stopping the training, we discard the last trained
layer (L4) and replace it by an adaptive FIR filter ĥn that is not
ˆ d[n
where dˆn = [d[n], ˆ − 1], . . ., d[n
ˆ − Lh + 1]]T . constrained to the length of the linear layer in training. The rest
of the network is then used to model the nonlinearities in the
testing signal. This procedure is similar to transfer learning in
III. A NEURAL NETWORK-BASED NLAEC the machine learning literature [20].
In this section, we describe a neural network for mimicking the To account for unseen variations and discrepancy between the
WH echo path model with the expectation that the performance training and testing conditions, we allow the weight vector ân
of the model will benefit from the superior modelling capability in L3, in addition to the FIR filter ĥ, to be adapted continuously
of neural networks while keeping the number of parameters via the SA-ERPF algorithm. By doing so, we assume implicitly
minimum. that the memory characteristics and functional shape of the
basis functions needed for modelling the nonlinear distortions is
time-invariant and only the individual contributions of the basis
A. Topology and Training of the Network functions are time-varying. Consequently, adapting the network
The WH model discussed earlier can be represented equiva- during the deployment phase is computationally not more costly
lently using an ANN. In this model, the nonlinearly distorted than adapting a WH model with an equivalent number of basis
signal is approximated by a neural network as illustrated in functions.
Fig. 2. In this network, the input vector xn is first weighted Finally, while the online adaptation of the prefilter ĉ is prob-
via the weight vector ĉ, representing the prefilter in the WH lematic for conventional adaptive filtering approaches [25], this
model. The first layer L1 consists of a single neuron with a linear challenge is resolved in the ANN-based approach by training
activation function. This neuron’s output equals the output of the prefilter instead of adapting it continuously.
the prefilter ĉT xn , and is weighted by w = [w1 , . . ., wM ]T and
passed to the nonlinear layer L2 which consists of M neurons. C. The SA Elitist Resampling Particle Filter
The output of the i-th neuron in this layer is
Due to its superior performance shown for Hammerstein
fi (ĉ xn ) = tanh(wi ĉ xn ).
T T
(4) model-based NLAEC, we choose to adapt the networks pa-
rameters using the SA-ERPF [21]–[23]. In the following, the
The outputs of the neurons are then weighted by the weight SA-ERPF is reviewed briefly. For a comprehensive description
vector ân = [â1 , . . ., âM ]T followed by the layer L3, consisting of the algorithm, the reader is referred to [21]–[23].
of a single linear neuron with its output being obtained by (2). Since the FIR filter ĥn is typically a long filter with many
Furthermore, the signal d[n] ˆ is passed to the linear layer coefficients of small values, the SA-ERPF performs a temporal
L4, consisting of Lh neurons and Lh − 1 delays, and is then decomposition of ĥn into the direct-path component ĥdirect,n ,

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on March 07,2021 at 11:22:42 UTC from IEEE Xplore. Restrictions apply.
HALIMEH et al.: NEURAL NETWORK-BASED NONLINEAR ACOUSTIC ECHO CANCELLER 1829

consisting of only the LSA significant coefficients centered For adapting the network, the SA-ERPF is configured
around the main peak, and a complementary component ĥcomp,n , with Np = 80 particles and a Gaussian likelihood density of
which captures all other coefficients [26]. Similarly, the distorted N (y[n], σv2 ), with σv2 = 10−4 . In concatenating the RIR, LSA =
signal dˆn is also decomposed into dˆdirect,n and dˆcomp,n . Finally, 11 is used. Moreover, to provide a reference, a linear FIR filter
the direct-path signal is obtained by of 512-taps adapted via the NLMS algorithm with a step size
0.2 is also evaluated.
ydirect [n] = y[n] − ĥTcomp,n dˆcomp,n . (5) As a measure of performance, the Echo Return Loss Enhance-
ment (ERLE) is used
The decomposition allows the SA-ERPF to concentrate the
E{y[n]2 }
computational power on the estimation of the short but signifi- ERLEn = 10log10 ; e[n] = y[n] − ĥTn dˆn , (8)
cant parameter vector ẑ n = [ĥTdirect,n , âTn ]T rather than the long E{e[n]2 }
and sparse vector [ĥTn , âTn ]T . The parameter vector’s posterior where E{·} denotes the expectation operator.
density p(ẑ n |ŷ1:n ) is approximated by a set of Np particles and
(m)
their associated weights {ẑ (m) n , qn ; m = 1 : Np }, where the
A. Synthesized Nonlinear Distortions
Np (m)
weights are normalized such that m= qm = 1. In this experiment, the AFI ANN’s capability to learn under-
The SA-ERPF starts with an initial population of particles lying nonlinear functions is examined. Specifically, we consider
drawn from an initial distribution p(ẑ 0 ). Unlike other particle four types of nonlinearities, i.e., the Fourier Series [29]
filters, e.g., [27], the SA-ERPF operates on two populations of
3  
particles, an elitist set ΦE,n and a non-elitist set ΦNE,n . At each cT xn
d1 [n] = a1,i sin iπ , (9)
time instant n, the particles’ weights are updated according
i=1
1.5
to [23]. Afterwards, the particles are divided into ΦE,n and
ΦNE,n depending on their weights as the odd-order Legendre polynomials of the first kind [10]
 (m) 
2
ΦE,n , if qn ≥ N1p d2 [n] = a2,i L2i+1 (cT xn ), (10)
ẑ n ∈
(m)
(m) (6)
ΦNE,n , if qn < N1p . i=0

a mixture of the two nonlinear functions


Partial resampling is applied by discarding the non-elitist  
cT xn
particles while keeping the elitist particles. To maintain the d3 [n] = a3,1 cT xn +a3,2 (cT xn )3 +a3,3 sin 3π , (11)
number of particles Np fixed, the elitist particles are used to 1.5
construct a Gaussian density N (µn , Σn ) where the mean vector and a completely different nonlinearity modelling the soft-
µn and the covariance matrix Σn are estimated as in [27]. This clipping behavior observed in loudspeakers,
density is then used to draw new particles that substitute the
discarded ones. 2
d4 [n] = arctan(5cT xn ). (12)
As an estimate of the parameter vector, the SA-ERPF extracts π
the Minimum Mean Square Estimate (MMSE) via As a training sequences, a one-minute-long speech signal
Np
concatenating multiple speakers from both genders is used.
 The signal is distorted using (9)–(12) to generate the dif-
ẑ MMSE
n = qn(m) ẑ (m)
n . (7) ferent signals {di [n]|i = 1, . . ., 4} using the prefilter c =
m=1 [1, 0.6, 0.2, 0.1]T and parameter vectors a1 = [0.8, 0.4, −0.2]T ,
In order to account for variations in the complementary a2 = [0.1, 0.1, 0.1]T , and a3 = [0.5, 0.2, 0.2]T . Afterwards,
component of the RIR ĥcomp,n , we adapt this vector through each signal is convolved with a recorded RIR h1,n , and corrupted
a conventional Normalized Mean Square (NLMS) algorithm by additive, normally-distributed, white noise v[n], resulting in
as in [26]. Finally, we refer to the proposed algorithm as the four different training sequences given by
Adaptive Filtering-Inspired (AFI) network. yi [n] = hT1,n di,n + v[n]; i = {1, 2, 3, 4}, (13)
where the variance σv2 is chosen such that the logarithmic Signal-
IV. EXPERIMENTAL RESULTS
to-Noise Ratio (SNR) is 30 dB.
In this Section, the AFI ANN is evaluated in two experiments, The testing sequences are generated by using a different
the first uses synthesized nonlinearities while the second uses speech signal of one minute duration, consisting of different
real distortions recorded by a mobile phone. The speech ut- speakers and recorded at fs = 16 kHz. Similarly to the training
terances used in generating training and testing data in both sequence, the signal is also distorted using (9)–(12) and the
experiments are taken from the TIMIT corpus [28]. resulting signals are then convolved with a recorded impulse
In all experiments, the network consists of a prefilter ĉ of response ĥ2,n , which represents a different acoustic enclosure
length Lc = 4, followed by a nonlinear layer with M = 9 neu- than that of the training sequences but of equal length. Finally,
rons. The FIR filters ĥn , or equivalently the linear layer, has the four testing sequences are corrupted by normally distributed
Lh = 512 taps. All speech signals are recorded at fs = 16 kHz. or Laplace-distributed additive white noise at various SNR
The network is trained using a training step size of 10−4 . The levels.
various hyperparameters, i.e., the training step size, the number The AFI ANN is compared to two different WH models
of neurons and the length of the training signals, were selected which use a prefilter ĉ with 4 taps and use either the Legendre
empirically. polynomials (10) or the Fourier series (9) as basis functions

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on March 07,2021 at 11:22:42 UTC from IEEE Xplore. Restrictions apply.
1830 IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 12, DECEMBER 2019

TABLE I
AVERAGE ERLE (dB) FOR EXPERIMENTS IN SECTION IV-A

for the memoryless nonlinear preprocessor, followed by an FIR


filter with 512 taps. Both models are adapted via the NLMS
algorithm. To compare the different approaches, the average
Fig. 3. ERLE in dB for the real nonlinear distortions.
ERLE values achieved by each approach after convergence are
shown in Table I for the different nonlinearities.
It can be seen in Table I that using the matching basis functions while the Fourier series-based WH uses the first three
w.r.t. the underlying nonlinearity yields the best performance odd-order basis functions. Both are followed by an FIR filter of
compared to the other approaches. This is seen for, e.g., a length Lh = 512.
WH model using Legendre polynomials to identify a speech As can be seen in Fig. 3, the AFI network outperforms conven-
signal distorted by a Legendre polynomial-based nonlinearity. tional NLAEC approaches by a considerable margin, except for
However, the AFI ANN achieves a performance close to the a few instances, where the conventional approaches outperform
optimum basis functions for the nonlinearities (9) and (10). This the AFI network slightly. Overall, the AFI network yields an
indicates the ability to learn the underlying nonlinear function average ERLE of 11.1 dB, while the Legendre polynomial-based
to a high degree of accuracy. For distortions generated by a approach yields an average of 8.9 dB, the Fourier series-based
mixture of the two basis functions in (11), the two WH models model resulted in an average ERLE of 9.5 dB, and the purely
based on the individual basis functions perform similarly. At the linear AEC achieves an average ERLE of only 7.2 dB. If we
same time, the AFI ANN is capable of outperforming both by assume that the modeling capacity of the AFI ANN covers
nearly 2 dB on average. For distortions generated by a function that of the conventional models, the time intervals where the
that does not match any of the model basis functions, i.e., conventional approaches outperform the AFI ANN can be seen
(12), the AFI ANN is outperforming the other approaches by as an effect of lacking training data for this situation. Moreover,
at least more than 2dB. In order to examine the generalization a tendency towards overfitting was observed when increasing
capabilities of the AFI ANN, Table I shows the performance the number of neurons, above the number of neurons used in
of the AFI ANN when trained for normally-distributed white this experiment, in either the nonlinear layer or the prefiltering
noise (denoted by N in the table) at 30 dB SNR and tested for layer, motivating a careful network-designing phase. As shown
different SNR levels and for Laplace-distributed white noise, by the two experiments, a one-minute-long speech signal is
denoted by L in the table. As seen from the results, the AFI ANN enough to capture the nonlinear distortions at a single volume
maintains its advantageous performance across the different level. However, in order to model the distortions emitted at
SNRs and noise types, confirming that the network is learning different levels, one would need a longer training sequence that
the underlying nonlinear function rather than overfitting to the covers the targeted volume levels. Alternatively, convex com-
seen data. Finally, small variations in the NLMS step size in the bination approaches can be employed, similar to [12], to com-
AFI ANN adaptation were observed to lead to small variations bine several networks trained separately to model the nonlinear
in the ERLE, suggesting that the proposed model is robust w.r.t. distortions at different levels. Such techniques are especially
the adaptation parameters. effective when abrupt and drastic changes in the volume levels
are present, where the AFI ANN’s performance is expected
to degrade.
B. Real Nonlinear Distortions
In this experiment, the training signal is a one-minute-long
speech signal representing multiple speakers from both genders. V. CONCLUSION
It is emitted by a loudspeaker of a hand-held mobile phone, In this letter, we introduced a novel, neural network-based
which introduces unknown nonlinear distortions, and recorded approach to NLAEC. The approach consists of training a neural
at an estimated SNR level of 20 dB, for which the noise power network to model the entire echo path, closely following a
was measured during speech pauses and the noisy signal’s power WH model, using the conventional error backpropagation al-
was used instead of the clean signal’s power. To generate the test gorithm. Then, the last linear layer of the network is replaced
signal, the same mobile phone was placed in a different acoustic by an adaptive FIR filter and the rest of the network is used
enclosure, and speech signals representing different speakers as a nonlinear preprocessor in the WH model. The proposed
from both genders, were emitted by the phone’s loudspeakers approach is capable of modelling memoryless nonlinearities as
using identical volume settings to those of the training signal, well as nonlinearities with memory using a linear FIR prefilter.
and were recorded by its microphone at an estimated SNR To account for variations in the nonlinear distortions, a single
level of 20 dB as well. In this experiment, the AFI ANN is weight vector is adapted during application of the network
compared to a Legendre polynomial-based and to a Fourier through the SA-ERPF. This weight vector controls the individual
series-based WH models with a prefilter ĉ of four taps each. contributions of the learned basis functions. The AFI ANN
The Legendre-based WH uses the first four odd-order Legendre outperforms conventional approaches for real and synthesized
polynomials as basis functions for the nonlinear preprocessor, nonlinearities as demonstrated through experimental results.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on March 07,2021 at 11:22:42 UTC from IEEE Xplore. Restrictions apply.
HALIMEH et al.: NEURAL NETWORK-BASED NONLINEAR ACOUSTIC ECHO CANCELLER 1831

REFERENCES [16] A. Birkett and R. Goubran, “Acoustic echo cancellation using NLMS-
neural network structures,” in Proc. Int. Conf. Acoust., Speech, Signal
[1] M. Sondhi, “An adaptive echo canceller,” Bell Syst. Tech. J., vol. 46, no. 3, Process., Detroit, MI, USA, May 1995, pp. 3035–3038.
pp. 497–511, 1967. [17] A. Ben Rabaa and R. Tourki, “Acoustic echo cancellation based on a
[2] C. Breining et al., “Acoustic echo control. An application of very-high- recurrent neural network and a fast affine projection algorithm,” in Proc.
order adaptive filters,” IEEE Signal Process. Mag., vol. 16, no. 4, pp. 42–69, 24th Annu. Conf. IEEE Ind. Electron. Soc., Aachen, Germany, Aug. 1998,
Jul. 1999. vol. 3, pp. 1754–1757.
[3] S. Theodoridis and R. Chellappa, Academic Press Library in Signal [18] S. Zhang and W. X. Zheng, “Recursive adaptive sparse exponential
Processing, Volume 4: Image, Video Processing and Analysis, Hardware, functional link neural network for nonlinear AEC in impulsive noise
Audio, Acoustic and Speech Processing, 1st ed. New York, NY, USA: environment,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9,
Academic, 2014. pp. 4314–4323, Sep. 2018.
[4] Y. Huang et al., Acoustic MIMO Signal Processing, Signals and commu- [19] J. Malek and Z. Koldovsky, “Hammerstein model-based nonlinear echo
nication technology. Boston, MA, USA: Springer, 2006. cancelation using a cascade of neural network and adaptive linear filter,” in
[5] E. Haensler and G. Schmidt, Topics in Acoustic Echo and Noise Control. Proc. Int. Workshop Acoust. Echo Noise Control, Xi’an, China, Sep. 2016,
Berlin, Germany: Springer-Verlag, 2006. pp. 1–5.
[6] A. Stenger and W. Kellermann, “Adaptation of a memoryless preprocessor [20] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE
for nonlinear acoustic echo cancelling,” Signal Process., vol. 80, no. 9, Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct.
pp. 1747–1760, Sep. 2000. 2010.
[7] D. Comminiello and J. Principe, Eds., Adaptive Learning Methods for [21] C. Huemmer et al., “Estimating parameters of nonlinear systems
Nonlinear System Modeling. Oxford, U.K.: Butterworth-Heinemann, Jun. using the elitist particle filter based on evolutionary,” IEEE/ACM
2018. Trans. Audio, Speech Lang. Process., vol. 26, no. 3, pp. 595–608,
[8] F. Kuech et al., “Nonlinear acoustic echo cancellation using adaptive Mar. 2018.
orthogonalized power filters,” in Proc. Int. Conf. Acoust., Speech, Signal [22] M. M. Halimeh et al., “Nonlinear acoustic echo cancellation using elitist
Process., Philadelphia, PA, USA, Mar. 2005, pp. 105–108. resampling particle filter,” in Proc. Int. Conf. Acoust., Speech, Signal
[9] S. Malik and G. Enzner, “State-space frequency-domain adaptive filtering Process., Calgary, AB, Canada, Apr. 2018, pp. 236–240.
for nonlinear acoustic echo cancellation,” IEEE Trans. Audio, Speech [23] M. Halimeh et al., “Hybrid particle filtering based on an elitist resampling
Lang. Process., vol. 20, no. 7, pp. 2065–2079, Sep. 2012. scheme,” in Proc. Sensor Array Multichannel Signal Process. Workshop,
[10] A. Carini et al., “Introducing Legendre nonlinear filters,” in Proc. Int. Conf. Jul. 2018, pp. 257–261.
Acoust., Speech, Signal Process., Florence, Italy, May 2014, pp. 7939– [24] D. Westwick and J. Schoukens, “Initial estimates of the linear subsystems
7943. of Wienerhammerstein models,” Automatica, vol. 48, no. 11, pp. 2931–
[11] M. Zeller et al., “Adaptive Volterra filters with evolutionary quadratic 2936, 2012.
kernels using a combination scheme for memory control,” IEEE Trans. [25] A. Stenger, Kompensation Akustischer Echos unter Einfluss von nicht-
Signal Process., vol. 59, no. 4, pp. 1449–1464, Apr. 2011. linearen Audiokomponenten, Ph.D. thesis, Faculty Eng., Friedrich-
[12] M. Zeller and W. Kellermann, “Evolutionary adaptive filtering based on Alexander-University Erlangen-Nuremberg, Germany, 2001.
competing filter structures,” in Eur. Signal Process. Conf., Barcelona, [26] C. Hofmann et al., “Significance-aware Hammerstein group models for
Spain, Aug. 2011, pp. 1264–1268. nonlinear acoustic echo cancellation,” in Proc. Int. Conf. Acoust., Speech,
[13] D. Comminiello, M. Scarpiniti, L. A. Azpicueta-Ruiz, J. Arenas-Garca, Signal Process., Florence, Italy, May 2014, pp. 5934–5938.
and A. Uncini, “Functional link adaptive filters for nonlinear acoustic echo [27] J. Kotecha and P. Djuric, “Gaussian particle filtering,” IEEE
cancellation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, Trans. Signal Process., vol. 51, no. 10, pp. 2592–2601, Oct.
pp. 1502–1512, Jul. 2013. 2003.
[14] F. Albu and K. Nishikawa, “New iterative kernel algorithms for non- [28] J. Garofolo et al., “TIMIT acoustic-phonetic continuous speech corpus
linear acoustic echo cancellation,” in Proc. Asia-Pacific Signal Inf. LDC93S1,” Web Download. Philadelphia, PA, USA: Linguistic Data
Process. Assoc. Annu. Summit Conf., Hong Kong, China, Dec. 2015, Consortium.
pp. 734–739. [29] S. Malik and G. Enzner, “Fourier expansion of Hammerstein mod-
[15] S. Van Vaerenbergh and L. A. Azpicueta-Ruiz, “Kernel-based identifica- els for nonlinear acoustic system identification,” in Proc. Int. Conf.
tion of Hammerstein systems for nonlinear acoustic echo cancellation,” Acoust., Speech, Signal Process., Prague, Czech Republic, May 2011,
in Proc. Int. Conf. Acoust., Speech, Signal Process., Florence, Italy, pp. 85–88.
May 2014, pp. 3739–3743.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on March 07,2021 at 11:22:42 UTC from IEEE Xplore. Restrictions apply.

You might also like