Professional Documents
Culture Documents
Speech Enhancement by Kalman Filtering With A Particle Filter-Based Preprocessor
Speech Enhancement by Kalman Filtering With A Particle Filter-Based Preprocessor
AbstractTo reduce nonstationary noise in real environments, we propose to use a particle filter as a preprocessor of Kalman filtering. From noisy input speech signals, the autoregressive (AR) model parameters are estimated by using a particle filter. Clean speech signal is estimated by a Kalman filter configured with the estimated parameters. Experimental results show that when speech signal is corrupted by babble noise, the proposed algorithm improves the output SNR by 1.5 dB.
wi( m ) p( y i | x i( m ) )m
I. INTRODUCTION A Kalman filter is an effective algorithm to enhance speech signals from a series of measurements observed over time, containing random noise and other inaccuracies. The Kalman filter achieves a faster convergence behavior than a normalized least-mean-square (NLMS)-based adaptive filter. There have been numerous studies on Kalman filtering for speech enhancement [1]. Even though noise observed in real situations has a nonstationary and dynamic feature, previous studies on Kalman filter were mostly applied by using the stationary white Gaussian noise assumption for simplicity. We present a sequential nonstationary speech enhancement method using the Kalman filtering combined with a particle filter [2] to estimate the parameters of speech signal and the variance of nonstationary additive noise. The sequential importance sampling (SIS) is used to estimate the parameters of the particle filter and clean speech signal is estimated by the particle filter in a frame-wise manner and is applied to a Kalman filter. In this work, speech signal is modeled as an autoregressive (AR) process. The noise variance and the parameters of the AR process are estimated in the Kalman filter. Our experimental results shows that the proposed Kalman filtering with a particle filter leads to significant signal-to-noise ratio (SNR) gain and improves the speech quality remarkably. II. SYSTEM DESCRIPTION A. Particle filter-based parameter estimation In the general formulation of the state estimation problem, the objective is to track the time evolution of the filtering density. If we assume that the parameters of speech and noise signal are known, the optimal estimate of the original speech signal can be obtained from a Kalman filter. However, in realistic scenarios, the background noise signal is unknown and the noise sources are mostly nongaussian. We use a particle filter in order to estimate the parameters (AR
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012-0001730).
~ ( m) = w ( m ) / w i i
M m =1
K w ( k ) m k =1 i
= 1,..., M
~ ( m) ( x x ( m ) ) w i i i
5. Resample to obtain M new equally-weighted set of particles. Resampling has the effect of removing particles with low weights and amplifying particles with high weights. Accordingly, the posteriori probability distribution of the resampled particles has a sharper distribution. The concept of the above particle filtering process is visualized in Fig. 1. From the speech signal estimated in the particle filter, the speech and noise parameters are computed through linear predictive coding (LPC). The estimated speech signal from the particle filter can be described by the p-th order AR model and ~ the state transform matrix F is defined as:
B. Estimation of clean speech by using the Kalman filter After the speech signal is estimated in the first stage by using the particle filter, we compute the AR parameters and noise variance. Given these parameters, the final clean speech ~ signal is extracted with a Kalman filtering process. Let v i
(m) xn p ( x n | x n 1 )
wn
(m)
= p ( yn | xn )
(evaluation)
p ( xn | y n )
( xn | y n ) p (resampling)
(m) xn +1 p ( x n +1 | x n )
(prediction)
340
0.6 0.4
~ the original speech signal. Let V denote the covariance matrix ~ (n) obtained from the result of estimated measurement noise v
denote the estimated observation (measurement) noise: ~ = y s , where y is the observed speech signal and s is v i i i i i
of the particle filtering process. Then the Kalman filter is applied as follows.
~ i|i 1 = Fi|i 1 x i 1|n1 x (Prediction) ~ ~ K i|i 1 = Fi|i 1 K i 1 Fi 1|i 1T + U i +1 Gi = K i|i 1 H i H i K i|i 1 H i
i|i 1 si = y i H i x i|i = x i|i 1 + Gi si x
T
-5
~ + Vi
-10
0.5
1.5
2.5
3.5
(Correction)
-0.6
0.5
1.5
2.5
3.5
K i = ( I Gi H i ) K i|i 1
i is the predicted state estimate, K i|i 1 is In the above, where x
(d) Enhanced signal with the proposed algorithm Fig. 2. Sample waveforms from out computer experiments. TABLE I OUTPUT SNR (DB) UNDER DIFFERENT NOISE CONDITIONS Input SNR(dB) Average Noise Algorithm Type -10 -5 0 5 10 N1 N2 N3 Baseline Proposed Baseline Proposed Baseline Proposed 1.7 3.8 1.5 2.9 3.0 3.4 2.9 5.2 3.8 4.7 2.1 4.6 4.1 7.5 5.2 7.1 2.6 7.6 5.5 8.4 6.2 7.9 4.1 7.8 6.3 9.1 6.9 8.6 5.8 8.8 4.1 6.8 4.7 6.2 6.0 6.4
the predicted state-error covariance matrix, K i is the filteringerror covariance matrix, U i is the covariance matrix of process noise, H i is the observation matrix and Gi is the Kalman gain. III. EXPERIMENTAL RESULTS We performed computer experiments to evaluate the proposed algorithm in various noise environments, by using the database of the Speech Separation Challenge [3]. The added noise sources are three types: N1 (car noise), N2 (babble noise), and N3 (white Gaussian noise). The sampling rate of speech database was lowered from 25 kHz to 16 kHz. The noisy speech signal was generated by mixing clean speech with the noise sources at -10, -5, 0, 5, 10 dB SNRs. Note that N1 and N3 are stationary noise but N2 noise has a nonstationary nature. Fig. 2 shows the waveforms of the clean, the noisy, and the enhanced speech signals, from top to bottom. In the figure, the noisy signal was corrupted with the N2 (babble) noise with input SNR=0 dB. We confirmed that noise was suppressed remarkably to yield enhanced speech signal. We also computed the output SNR (dB) of enhanced speech signal. Table I compares the output SNR with respect to speech signals under the three noise conditions by using the Kalman filter with an NLMS adaptive filter-based preprocessor (Baseline) and the Kalman filter with a particle filter-based preprocessor (Proposed), respectively. The proposed algorithm provides the average SNR increase of 2.7 dB, 1.5 dB, and 0.5 dB under the N1, N2, and N3 noise conditions, respectively. From these results, it is justified that our algorithm significantly improves the objective quality measure in nonstationary environments as well as in stationary environments. IV. CONCLUSIONS We proposed a Kalman filter-based speech enhancement
algorithm where a particle filter is used as a preprocessor to estimate the Kalman filter parameters in nonstationary noise conditions. In computer experiments with artificially-mixed noisy speech signals, the proposed algorithm achieved the improvement of the output SNR by 2.7 dB, 1.5 dB, and 0.5 dB for the car noise, babble noise, and white Gaussian noise conditions, respectively. REFERENCES
[1] [2] [3] D.C. Popescu and I. Zeljkovic, Kalman filtering of colored noise for speech enhancement, Proc. ICASSP, pp3 997-1000, 1998. M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Trans. Signal Processing, vol. 50, no. 2, Feb 2002. M.P. Cooke, J. Barker, S.P. Cunningham, and X. Shao, An audio-visual corpus for speech perception and automatic speech recognition, J. Acoust. Soc. Am., vol. 120, issue 5, pp. 2421-2424, Nev., 2006.
341