This action might not be possible to undo. Are you sure you want to continue?

2003, Kyoto, Japan

**ON THE APPLICATION OF THE UNSCENTED KALMAN FILTER TO SPEECH PROCESSING
**

Sharon Gannot Faculty of Electrical Engineering, Technion, Technion City, 32000 Haifa, Israel e-mail: gannot@siglab.technion.ac.il Marc Moonen Dept. of Elect. Eng. (ESAT-SISTA), K.U.Leuven, B-3001 Leuven, Belgium e-mail: Marc.Moonen@esat.kuleuven.ac.be

ABSTRACT In a series of recent studies a new approach for applying the Kalman ﬁlter to nonlinear system, referred to as Unscented Kalman ﬁlter (UKF), was proposed. In this contribution1 we apply the UKF to several speech processing problems, in which a model with unknown parameters is given to the measured signals. We show that the nonlinearity arises naturally in these problems. Preliminary simulation results for artiﬁcial signals manifests the potential of the method. 1 INTRODUCTION

y

should be calculated. We brieﬂy summarize the method. The mean and covariance of x are represented by 2L + 1 points and weights

'

¯ X0 = x

$

p

¯ Xl = x +

Xl+L W0 Wl

(m) (c)

(L + λ)Pxx ; l = 1, . . . , L l p ¯ (L + λ)Pxx ; l = 1, . . . , L = x−

l

= λ/(L + λ) = λ/(L + λ) + (1 − α2 + β) = Wl

(c)

The recently proposed unscented transform (UT) is a method for calculating the statistics of a random variable undergoing a nonlinear transformation that was ﬁrst suggested by Julier et al. [1]. This method was used to generalize the Kalman ﬁlter to nonlinear systems by Julier et al. [1] and was further extended by Wan et al. [2] to problems where both signals and parameters are jointly estimated. In [2] (and other contributions) the nonlinearity arises from the parameter production model. In this contribution we further apply the UKF to several speech processing problems, namely single microphone speech enhancement and multi-microphone speech dereverberation. We show that in these applications the nonlinearity arises naturally, due to the signals and parameters multiplication, if both are given a dynamic model. The technique is demonstrated by several simple examples. In Section 2 the unscented transform and its application to nonlinear Kalman ﬁlter are reviewed. Sections 3.1 and 3.2 discuss the application of the method to the problems of single microphone speech enhancement and two microphone speech dereverberation, respectively. We draw some conclusions and discuss some further directions in Section 4. 2 PRELIMINARIES

W0

(m)

&

where,

p

= 1/2(L + λ); l = 1, 2, . . . , 2L

l

%

(L + λ)Pxx

is the l-th row or column of the

corresponding matrix square root, and λ = α2 (L + κ) − L. α determines the spread of the sigma points. α = 1 was used throughout our simulations . κ is a secondary scaling parameter. The choice κ = 3 − L maintains the kurtosis of a Gaussian vector. Throughout our simulations κ is set to 0. β is used to incorporate prior knowledge of the distribution (β = 2 for Gaussian distributions). A proper choice of these parameters and its inﬂuence on the obtainable performance is still an open topic. The mean and covariance of the vector y are calculated using the following procedure,

#

"

2.2

1. Construct the sigma points Xl , l = 0, . . . , 2L. 2. Transform each point: Yl = f (Xl ) , l = 0, . . . , 2L. P (m) ¯ 3. Mean: Use weighted averaging, y ≈ 2L Wl Yl . l=0 4. Covariance: Use weighted outer product, P (c) ¯ ¯ Pyy ≈ 2L Wl (Yl − y ) (Yl − y )T . l=0

!

2.1 The Unscented Transform (UT) ¯ Let x be an L-dimensional random vector with mean x and covariance matrix Pxx . Let, y = f (x) be a nonlinear transformation from the random vector x to another random vector y . The ﬁrst and second order statistics of the vector

1 This research work was carried out at the ESAT laboratory of the Katholieke Universiteit Leuven, in the frame of the Interuniversity Attraction Pole IUAP P4-02, Modeling, Identiﬁcation, Simulation and Control of Complex Systems, the Concerted Research Action Mathematical Engineering Techniques for Information and Communication Systems (GOA-MEFISTO-666) of the Flemish Government and the IT-poject Multi-microphone Signal Enhancement Techniques for handsfree telephony and voice controlled systems (MUSETTE-2) of the I.W.T., and was partially sponsored by Philips-ITCL.

The beneﬁts of using the UT are presented in [1] and [2]. The Application of the Unscented Transform to the Nonlinear Kalman Filtering Problem The Kalman ﬁlter is a recursive and causal solution for minimum mean square error (MMSE) state estimation in the Gaussian and linear case. The Kalman equations are formulated with the state-space notation and consist of two stages. A propagation stage in which the mean and a priori covariance of the respective state are predicted based on the system dynamics and on the previous time instant estimate, and an update stage in which this prediction is optimally weighted with the new measurement. The error covariance, interpreted as the amount of conﬁdence we have in the estimate, is propagated in a similar fashion.

27

Let s(t) and (t) be a signal state space vector and a parameter vector. a two step approach is taken. (d) Update equations. The situation is more complex when the involved equations are nonlinear. into a joint estimation problem. u(t) and v (t) are innovation and measurement noise sequences. Speciﬁcally. The same problem can be reformulated s(t − 1|t − 1) ˆ D s(t|t) ˆ Speech Kalman Filter z(t) Parameters Kalman Filter UT (a) ˆ θ(t − 1|t − 1) D D ˆ θ(t|t) s(t − 1|t − 1) ˆ Predicted Sigma Points Signal & Measurement s(t|t) ˆ Current Sigma Points Speech + Parameters z(t) X (t − 1|t − 1) Non-Linear System Dynamics & Measurement {Φ. but yet not accurate enough. Pθ (t|t) (d) Figure 1: Unscented Kalman ﬁlter: (a) Unscented transform. Thus. multi-microphone speech enhancement and dereverberation) a problem of estimating both the speech signal and various parameters arises. In the past the extended Kalman ﬁlter (EKF). while the joint scheme consists of a single nonlinear step. single microphone speech enhancement. A better method. (c) Inverse unscented transform. respectively. 3.When the system dynamics and the measurement equation are linear. proposed in [1]. 3 APPLICATION TO SPEECH PROCESSING x(t) z(t) = Φ (x(t − 1). Deﬁne also the augmented state vector xT (t) = sT (t) T (t) . This subject is still under investigation. as it involves the calculation of derivatives. under the Bayesian framework. We remark that as this nonlinearity is separable this formulation might lead to the same performance as in the dual scheme. Alternatively. referred to by Wan et al. In each time instant a Kalman ﬁltering step for the signal is applied based on the current estimate of the parameters. 3. where s(t) represents the sampled speech UT−1 Z(t|t − 1) z (t). Nonlinear transition and measurement equations are given by. the problem of joint estimation of the speech and the parameters becomes nonlinear if both are modelled as stochastic processes. v (t)) . all the calculations involved are straightforward. h} X (t|t − 1) Z(t|t − 1) Kalman Filter ˆ θ(t − 1|t − 1) D ˆ θ(t|t) (b) X (t|t − 1) ˆ x(t|t − 1). The parameter estimation might be conducted using recursive methods such as RLS or LMS.1 Single Microphone Speech Enhancement The problem of single-microphone speech enhancement was extensively studied.ˆ(t) z K(t) = Pxz (t)Pzz (t) −1 New Signal Estimate & Error Covariance ˆ x(t|t) → ˆ ˆ s(t|t). (b) Propagation equations. Each of the two steps comprising the dual scheme is linear. The method consists of calculating the mean and coˆ s(t − 1|t − 1) Ps (t − 1|t − 1) ˆ θ(t − 1|t − 1) Pθ (t − 1|t − 1) Current Sigma Points X (t − 1|t − 1) In many model-based problems in speech processing (e. is to use the previously mentioned unscented transform in order to propagate the mean and covariance through the nonlinearities. u(t)) = h (x(t − 1). a method for propagating mean and covariance through nonlinearities is needed. 1 summarizes the steps involved in Unscented Kalman ﬁlter (UKF). θ(t|t) Px (t|t) → Ps (t|t). but no claims of optimality are valid. based on the linearization of the equations. Note that most operations involve parameter and state vector multiplications. respectively . [2] as dual estimation. Pzz (t) ˆ (c) Predicted Signal & Error Covariance & Measurment ˆ x(t|t − 1). [3].1. The complexity of the suggested method is quite low as only an increase of dimensions by a factor of 2L + 1 is required. In this case. Px (t|t − 1) Figure 2: Dual (top) and joint (bottom) estimation procedures. The dual estimation method can be seen as a sequential variant of the estimate-maximize (EM) procedure. Discussion on the subject can be found in [3]. Px (t|t − 1) Optimal Weighting z(t). This approach will be used throughout this work. By assuming AR model to the speech signal and giving dynamic model to the AR parameters both dual and joint schemes can be formulated. as only ﬁrst-order approximation is applied. the parameters can be given a dynamic model and the Kalman ﬁlter can be applied. In parallel a parameter estimate step is applied based on the current signal state estimate. The method is summarized in Fig. 2 (bottom). The approach of jointly estimating speech signal and its parameters is summarized in Fig. the use of Kalman ﬁlter for estimating both the signal and the parameters is presented by Gannot et al. Fig.1 Signals Model Let the signal measured by the microphone be given by z(t) = s(t) + v(t). 28 . This method might be quite complex. was used. In the ﬁrst.g. 2 (top). Pxz (t). variance of the augmented state vectors undergoing a known nonlinear transform by virtue of the unscented transform. This problem can be addressed in two ways.

. We shall assume a time varying AR model for the speech signal.7 Results Time varying Gaussian AR process (4 coeﬃcients) embedded in white Gaussian noise with input SNR level of about 20dB is processed by the joint Kalman scheme2 . which implies that Φs (t).2 0 −0. . . assuming that the signal and all the noise parameters are known. ··· 3 · · · −αp (t) 0 ··· ··· 07 .7 7 . . .e.1.7 . 2 may be applied. .5 0 5000 Samples 10000 15000 2000 4000 6000 8000 10000 12000 14000 16000 Samples P (t|t − 1) = Φs P (t − 1|t − Kalman gain: g Figure 3: Tracking ability of the parameters of an AR process embedded in white noise.signal and v(t) represents an additive background noise.2 −0.6 −0. i. 6 6 .. . . . ··· 1 0 (10) (3) P (t|t) = P (t|t − 1) − i h 2 2 k (t) hT P (t|t − 1)h + gs (t) + gv kT (t). uαp (t) with the respective covariance matrix Q (t) = E{u (t)uT (t)}. 4 . Since both signal and parameters are not known. the dual scheme presented in Fig. and Φ = (t − 1) + u (t) (4) z(t) = 1 0 0 . In each time instant the AR parameters are estimated using the estimated speech signal and the speech signal is estimated using the current parameter estimate.4 Speech Kalman Filter Propagation equations: ˆ sp (t|t − 1) ˆ = Φs sp (t − 1|t − 1) 1)ΦT s (5) +g T s s −0. . s(t − p) Ip×p or very close to it.8 0 0.5 . . This set of equation is nonlinear since it involves a multiplication of the speech state space and the transition matrix comprised of the parameters process. (t)us 0 (11) x(t) = Φs Φ x(t − 1) + gsu (t)(t) 0 {z } | nonlinearity 3. . .. Deﬁne. 0 ] and hT = [ 1 0 . which includes the desired speech signal s(t). a Kalman ﬁlter for the parameter estimate might be applied. written in Matlab c language.1.. gT (t) = [ gs (t) 0 . . .7 . p X s(t) = − αk (t)s(t − k) + gs (t)us (t) (1) k=1 Update equations: ˆ sp (t|t) h i ˆ = sp (t|t − 1) + k(t) z(t) − hT sp (t|t − 1) (7) s ˆ h i 2 P (t|t) = P (t|t − 1) − k(t) hT P (t|t − 1)hs + gv kT (t) s where the excitation us (t) is a normalized (zero mean unit variance) white noise. 0 ··· 2 hT P P (t|t − 1)H 2 2 (t|t − 1)h + gs (t) + gv (9) Update equations: h i ˆ (t|t) = ˆ (t|t − 1) + k (t) z(t) − hT ˆ (t|t − 1) ··· 0 . 6 . where. So. h (t) = s(t − 1) s(t − 2) . .. Then. 2 (top) is then used. The AR parameters 0.7 . 6 . α2 (t). . The dual scheme suggested in Fig. k(t) = T P (t|t − 1)hs 2 hs P (t|t − 1)hs + gv (6) 2 All Simulations in this paper are implemented by modifying R.1.6 0. The parameter state-space equations are.3 Dual Scheme On the one hand.1.7 . (t) = Φ z(t) = T h T (t) (t) + g s (t)us (t) + v(t). . and α1 (t).1. αp (t) ] and the innovation vector uT (t) = uα1 (t) uα2 (t) . αp (t) are the AR coeﬃcients. . 3. van der Merwe et al. i.2 Parameter model Deﬁne the parameter vector T (t) = [ α1 (t) α2 (t) . . . . 2 (bottom) can be used.5 Parameters Kalman Filter Propagation equations: ˆ (t|t − 1) = Φ ˆ (t − 1|t − 1) P (t|t − 1) = Φ P (t − 1|t − 1)ΦT + Q Kalman gain: (8) sp (t) = Φs (t)sp (t − 1) + g s (t)us (t) (2) z(t) = hT sp (t) + v(t) s k (t) = where sT (t) = s(t) s(t − 1) .1.8 0. 3. The noise level is estimated during non-signal portions of the noisy signal. Then a states s space form is given by. 29 . xT (t) = sp (t) (t) . 3. .e.5 2 2. 3. On the other hand. the joint scheme suggested in Fig. is obtained using the Kalman ﬁltering equations.. the optimal causal MMSE linear state estimate.4 1 1. hs and gs (t) are known. 0 x(t) + v(t). .5 Gain 3. . .. s(t − p) .4 0. 0 ]. . 3. . The tracking ability of the procedure is presented in Fig. .6 Joint Scheme An augmented state vector of the speech and the parameters is constructed. 7. 6 Φs (t) = 6 .. The signal state p transition matrix Φs (t) is given by: −α1 (t) −α2 (t) 1 0 6 6 . .. 6 . . gs (t) represents the innovation gain. . . assuming the speech signal is known.7 . performance with real speech signals is still to be determined. hT (t) is known. [4] code. The additive noise v(t) is assumed to be a realization from a zero 2 mean white Gaussian stochastic with variance gv .

s(t) = − X k=1 p 2 1 0 −1 −4 −2 −6 −8 0 0 −2 αk s(t − k) + gus (t)us (t) (12) −3 −4 0 500 1000 Samples 1500 2000 500 1000 Samples 1500 2000 z1 (t) = z2 (t) = na −1 X k=0 na −1 X k=0 a1 (k)s(t − k) + g1 e1 (t) a2 (k)s(t − k) + g2 e2 (t). Julier. 5 * References [1] S.1 Signals Model The reverberated and noisy signals presented in Fig. Results show that the method is applicable to the problems in hand. ua1 (t) ua2 (t) · · · uana (t) 2 2 and Φs (t) an na × na signal transition matrix having equivalent structure to the one presented in (3). [2] E. ﬁtting the UKF framework. Alberta. Weinstein. 477–482. Figure 5: Tracking ability of the parameters of the dereverberation problem.2 z2 (t) 0 −0. pp. the augmented transition-measurement equations can be written as. “The Unscented Kalman Filter for Nonlinear Estimation. 1500 0 0 500 1000 Samples A2 coeff. The SNR value is very high.” IEEE trans.6 0. van der Merwe.8 15 Gain A1 (ejω ) s(t) A2 (e jω P z1 (t) 0.8 5 Figure 4: Two channel dereverberation problem.2 Two Microphone Speech Dereverberation In the two channel dereverberation problem a speech signal. modelled as an FIR ﬁlter. 4 DISCUSSION T (t) = sTa (t) n gT (t) s T u (t) uT1 (t) a T ua2 (t) (t) gus (t) a1 (t) a2 (t) g1 g2 . for a comprehensive test. [3] S. Canada. Noise is then added to the output constructing the noisy and reverberated speech signals.” IEEE Trans. 5. Communication and Control (AS-SPCC). AR parameters 1 0. 3. which variance is estimated from signal free segments. It is worth mentioning that the presented problem is a very simple one. Gannot.2. Performance limitations and optimality issues of the suggested method are under current research. the tracking ability of the algorithm is presented in Fig. we have again a problem of estimating both the speech signal and the following model parameters..” in Symposium 2000 on Adaptive Systems for Signal Processing. a2 are 3 taps long.6 −0. = = = = = s(t) s(t − 1) · · · s(t − na + 1) gs (t) 0 · · · 0 uα1 (t) uα2 (t) · · · uαp (t) ua1 (t) ua2 (t) · · · uana (t) 1 1 1 2 In this paper we applied the newly proposed UKF to two speech processing problems. 2 sna (t) 3 2 Φs (t) 0 0 0 3 2 sna (t − 1) 3 2 gs us (t) 3 0 7 6 (t − 1) 7 6 u (t) 7 6 0 Ip 0 6 (t) 7 + 4 a (t) 5 = 4 0 0 Ina 0 5 4 a1 (t − 1) 5 4 ua1 (t) 5 1 a2 (t − 1) 0 0 0 Ina a2 (t) ua2 (t) | {z } nonlinearity 3 2 sna (t) z1 (t) a1 (t) 0 0 0 6 (t) 7 + g1 e1 (t) = z2 (t) a2 (t) 0 0 0 4 a1 (t) 5 g2 e2 (t) a2 (t) | {z } nonlinearity which is a nonlinear set of equations. vol. Oct. Then. Jul. 1500 2000 5 4 3 6 4 2 3. J. it should be further applied to real speech signals embedded in higher noise levels. modelled as an AR process. is ﬁltered by an acoustical transfer function (ATF). 45. on Automatic Control. g1 e1 (t) 3. 3. Wan and R.4 −0. “A New Method for the Nonlinear Transformation of Means and Covariances in Filters and Estimators. 6.2 Joint Speech and Parameters Estimation Deﬁne.” /users/sista/sgannot/matlab/Ukf_W/.3. IEEE. 373–385. 1998. Thus. [4] R. pp. Burshtein. Mar. Even in this simple case convergence is not guaranteed. May 2001.2 −0. van der Merwe.2. −1 0 500 1000 Samples A1 coeff. no. the order of the AR process is 1 and the ﬁlters a1 . and E. vol. 4.4 10 ) P g2 e2 (t) 0. Lake Louise. Uhlmann and H. D. “Matlab c code. 2000. no.3 Results For a low level white noise signal. Durrant-Whyte. 4 are given by the following model. as depicted in Fig. 30 . on Speech and Audio Proc. 2000. “Iterative and Sequential Kalman Filter-Based Speech Enhancement Algorithms.2. Nevertheless.F. A. 4.

- Specific Guidelines for Writing a Business Letter
- Syntel Written Exam Pattern
- Testing Tools.pdf
- CGI, Virtusa, Cypress, Alcatel Lucent, LGSoft Hiring - Doradlarajesh@Gmail
- Image Encryption and Decryption
- Image Encryption and Decryption
- Notes Earthquakes
- Welcome to VVIT Guntur
- JNTU Kakinada IV B.tech[RR,R05,R07] Results_JNTU Kakinada IV B
- Welcome to VVIT Guntur
- How to Prepare 1
- Embedded Sample Paper
- -Notes.pdf
- Pg Dad Mission Booklet 1
- C-CAT Sample Questions
- Training & Placement Details-Technocrat Automation
- C-CAT Sample Questions
- final doc
- Learning Material DataCommunication
- e Cad Lab Manual
- GATE ECE Question Paper With Answers 2011
- ofdm
- Ofdm Signalling
- cdac
- autoreg

Sign up to vote on this title

UsefulNot usefulit is the one

it is the one

- UKF and PF
- Overview
- DSP basics
- 10.5923.j.ep.20120202.02
- 16 Novak
- Ode Revision
- linearization
- Linearinversionofseismicdata Arthurwegleinresearchpaperm Osrp 150111231808 Conversion Gate01
- Pla 2006
- process control ch 4
- no2
- Basic Control System With Matlab Examples
- Analitička evaluacija vodljivosti prijenosa topline s različitim svojstvima
- index (13)
- Linear Inversion of Seismic Data - Arthur Weglein Research Paper, M-OSRP
- Pendulum
- Schon 2003
- AMM.376.207
- Modeling and Identification of Hysteresis in Piezoelectric Actuators
- 1-s2.0-S0377042709005044-main
- 9073
- 111.EOLSS04
- Lecture Notes Nonlinear Equations and Roots
- Equivalent linearization
- DSP and Applications slide3
- Impactpaper.pdf
- Lecture 5Software engineering
- ~hhCE67
- Pages From [Monson Hayes] Schaum s Outline of Digital Signal (BookSee.org)
- vol7no4_1
- kalman