This action might not be possible to undo. Are you sure you want to continue?

8, November 2010

**A New Noise Estimation Technique of Speech Signal by Degree of Noise Refinement
**

Md. Ekramul Hamid

College of Computer Science King Khalid University Abha, Kingdom of Saudi Arabia e-mail:ekram_hamid@yahoo.com

**Md. Zasim Uddin
**

Dept. of Computer Science University of Rajshahi Rajshahi, Bangladesh. e-mail: cse.zasim@gmail.com

**Md. Humayun Kabir Biswas
**

College of Computer Science King Khalid University Abha, Kingdom of Saudi Arabia e-mail: mhkbiswas@yahoo.com

Somlal Das

Dept. of Computer Science University of Rajshahi Rajshahi, Bangladesh. e-mail:somlal_ru@yahoo.com

Abstract— An improved method for noise estimation of speech utterances which are disturbed by additive noise is presented in this paper. Here, we introduce degree of noise refinement of minima value sequence (MVS) and some additional techniques for noise estimation. Initially, noise is estimated from the valleys of the spectrum based on the harmonic properties of noisy speech, called MVS. However, the valleys of the spectrum are not pronounced enough to warrant reliable noise estimates. We, therefore, initially use the estimated Degree of Noise (DON) to adjust the MVS level. For every English phoneme DON is calculated and averaged within those processing frames for the each input SNR. We consider this calculated average DONs as standard value corresponding to the input SNR which is aligned with the true DON using the least-squares (LS) method results a function to estimate the degree of noise. Therefore, using the technique, it is possible to estimate the state of the added noise more accurately. We use two stage refinements of estimated DON to update the MVS as well as to estimate a nonlinear weight for noise subtraction. The performance of the proposed noise estimation is good when it is integrated with the speech enhancement technique. Keywords-component; Noise Estimation, the Degree of Noise, Speech Enhancement, Nonlinear Weighted Noise Subtraction

noise-only regions of the noisy speech spectrum. In the improved MCRA approach (Cohen 2003) [4], a different method was used to track the noise-only regions of the spectrum based on the estimated speech-presence probability. Doblinger [5] updated the noise estimate by continuously tracking the minimum of the noisy speech in each frequency bin. As such, it is computationally more efficient than the method in Martin 2001. However, it fails to differentiate between an increase in noise floor and increase in speech power. Hirsch and Ehrlicher [6] updated the noise estimate by comparing the noisy speech power spectrum to the past noise estimate. This method fails to update the noise when the noise floor increases abruptly and stays at that level. In our previous study, Hamid (2007) [7] proposed the noise estimation by using the MVS. The noise floor is updated with the help of estimated DON. Here DON is estimated on the basis of pitch and the pitch of unvoiced sections is not accurately estimated. In this paper, we propose a method which has good noise tracking and controlling capability. To estimate noise, first we search for the valleys of the amplitude spectrum on a frame by frame basis and estimate minima values of the spectrum, called minima value sequence (MVS). To improve the estimation accuracy of MVS, we use DON. As it is a single-channel method, direct estimation of the degree of noise is not possible. For that, frame wise averaged DON is estimated from the estimated noise of the observed signal. We have considered these DONs as standard value corresponding to the input SNR. Then each of these estimated 1st averaged DONs for corresponding input SNR is aligned with the true DON using the least-squares (LS) method results the 1st estimated degree of noise (DON1) of that frame. The 1st estimated DON1 is applied to update the MVS. Next, the noise level is reestimated and from the estimated noise, we again estimate 2nd averaged DON and similarly get the 2nd estimated DON2. We used the 2nd estimated DON2 to estimate the weight for noise subtraction process. Because noise is estimated from the estimated DONs, which is obtained from the true DON, so it is possible to estimate noise amplitudes in more accurate form with lower speech distortion and able to suppress musical noise in the enhanced speech. II. PROPOSED NOISE ESTIMATION METHOD We have assumed that speech and noise are uncorrelated to each other. Let y(n)=s(n)+d(n), where y(n) is the observed noisy speech signal, s(n) is the clean speech signal and d(n)

I.

INTRODUCTION

Noise estimation is one of the most important aspects for single channel speech enhancement. Usually in single-channel speech enhancement systems, most algorithms require a voice activity detector (VAD) and the speech/pause detection plays the major role in the performance of the whole system. However, these systems can perform well for voiced speech and high signal-to-noise ratio (SNR), but their performance degrades with unvoiced speech in low SNR. Traditional noise estimators are based on voice activity detectors (VAD) which are difficult to tune and their application to low SNR speech results often in clipped speech. The original MMSE-STSA estimates the noise power spectrum on the basis of the noisy speech only in the first non-speech period where the pure noise is available [1]. However, these systems can perform well only for voiced speech and high SNR. Martin (2001) proposed a method for estimating the noise spectrum based on tracking the minimum of the noisy. The main drawback of this method is it fails to update the noise spectrum when the noise floor increases abruptly [2]. Cohen (2002) [3] proposed a minima controlled recursive algorithm (MCRA) which updates the noise estimate by tracking the

37

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

is the additive noise. We further assume that signal and noise are statistically independent. Under the above assumptions, we can write the powers as Py=Ps+Pd. A. Estimation of the minima value sequence (MVS) The sections of consecutive samples are used as a single frame l (320 samples). Consecutive frames are spaced l’ (100 samples) achieving an almost 62.75% overlap between them. The short-term representation of a signal y(n) is obtained by windowing (Hamming window) and analyzed using N=512 point discrete-Fourier transform (DFT) in sampling frequency 16KHz. Initially, noise spectrum is estimated from the valleys of the amplitude spectrum and we assume that the peaks correspond to voice parts and valleys are the noise only parts. The algorithm for noise estimation is as follows: 1. Compute the RMS value Yrms of the amplitude spectrum Y(k). We detect the minima Ymin ( k min ) ← min(Y (k )) values of Y(k) when the following condition (Y(k)<Y(k-1) and Y(k)<Y(k+1) and Y(k)<Yrms) is satisfied. The kmin expresses the positions of the frequency bin index of minima values. Interpolate between adjoining minima positions ( k min ← k ) to obtain the minimum value sequences (MVS) Ymin(k). We smooth the sequences by taking partial average called smoothed minimum value sequences (SMVS). This process continuously updates the estimation of noise among every analysis frames.

Figure 1. Block diagram of the 2nd estimated DON1,

Z1m .

1st averaged estimated DON

Z1m

Z1m =

1 M

∑ Pη (m)

m=1 obs

M

P (m)

(2)

2.

3.

An estimation of noise from the SMVS is survived by an overestimation and underestimation of the SNR. To achieve good tracking capability with controlled overestimation problem, the proposed noise estimation algorithm adopting the concept of DON. The block diagram of the noise estimation process is given in Figure 1. B. Estimation of the Degree of Noise (DON) In a single-channel method, we only know the power of the observed signal. Therefore, direct estimation of the degree of noise ( Pd / Pobs ) is not possible. For that, frame wise DON is estimated from the estimated noise of the observed signal of each frame m. For optimal estimation of DON, we are carried out our experiment on 20 vowel phonemes of 3 male and 3 female taken from TIMIT database. First white noise of various SNR are added to these voiced vowel phonemes. Then for each SNR white noisy phonemes are processed frame wise and DON is estimated in each frame for each phoneme individually. For every phoneme DON is averaged within those processing frames for the corresponding input SNR. Then each of these estimated 1st averaged DONs of each frame m for corresponding input SNR expressed as Z1m . This Z1m is aligned with the true DON (Ztr) using the least-squares (LS) method results the 1st estimated degree of noise (DON1) Z1m of that frame. The true DON (Ztr) is given by

Ztr = Pd = Ps + Pd 1

dB

where, M are the noise added frames; Pη(m)and Pobs(m) are the powers of noise and observed signals, respectively. Here it obvious that we consider only the voiced phonemes in our experiment. So the averaged DON value should be limited to voiced portion of a speech sentence. But practically the unvoiced portion contaminated with higher degree of noise. Hence the estimated noise is higher for unvoiced frame than from voiced frame. Consequently higher DON value is obtained from unvoiced frame than from voiced frame that is logically resemblance. The degree of noise estimated from a previously prepared function using least square method is given by [7] (3) where Z1m is the 1st estimated DON1 of frame m. The error between the true and the estimated values can be minimized by tuning a, b. In the experiment, 20 phoneme sounds for 3 male and 3 female degraded by the white noise in different SNRs (-10,-5,0,….,30 dB) is considered. The value of Z1m is applied to update the MVS. Next, the noise level is re-estimated with the help of Z1m. Finally, from the estimated noise, we again estimate 2nd averaged DON ( Z2m ) and similarly the 2nd

Z 1m = a × Z 1m + b

estimated DON2 (Z2m) which is used to estimate the noise weight for nonlinear weighted noise subtraction. We conduct an experiment on the noisy speech (white noise) utterance /water/ of a female speaker of various input SNRs and obtain the 1st estimated DON1, Z1m and 2nd estimated DON2, Z2m. Figure 2 illustrates the frame wise true degree of noise calculated and the estimated degree of noise obtained in every analysis frame for different input SNRs. By adopting smoothing in the MVS, the overestimation problem is minimized and the effect of musical noise is reduced. In fact smoothing is performed to reduce the high frequency fluctuations. Since for speech most of the signal energy is concentrated in low frequencies, for that reason smoothing is reducing the high frequency components and gives increased

1 + 10 10

(1)

38

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

signal-to-noise ratio. The Fig. 3 shows, the true and the estimated degree of noise are almost equal in all SNRs.

introduce a nonlinear weighting factor to control the overestimation and minimizing the effect of residual noise. The NWNS is given by:

) s1(n) = y(n) − α × Ztr × dss(n)

where α = 0.3019 + 6.4021× Z 2m − 14.109 × Z nonlinear weighting factor.

2 2m

(5)

3 + 9.8273 × Z 2 m

is

It is observed from Eq. (5) that it needs the input SNR. The input SNR can be estimated using variance is given by

⎛σ 2 SNR input = 10 log 10 ⎜ s2 ⎜σ ⎝ η

Figure 2a. True vs 1st avg. DON (T) and True vs 1st estimated DON1 (B). Figure 2b. True vs 2nd avg. DON (T) and True vs 2nd estimated DON2 (B).

2

⎞ ⎟ ⎟ ⎠

(6)

2 σ where, σ s and η are the variances of speech and noise, respectively. We assume that due to the independency of noise and speech, the variance of the noisy speech is equal to the sum of the speech variance and noise variance. It is found that by adopting nonlinear weighted in NS, a good noise reduction is obtained. Although with the NWNS, we find the good performance with less musical noise by informal listening test.

Figure 3. Frame wise graphical representations of the true (solid with point) and the 1st estimated DON1 (dotted line with circle) and 2nd estimated DON2 (solid line with double linewidth) for –5dB (left) and , 5dB (right) SNR noisy speech.

Estimation of Noise Spectrum The noise spectrum is estimated from the SMVS and 1st estimated DON according to the condition

Dm (k ) = Ymin (k ) + Z1m × Yrms

C.

(

)

(4)

Then we made some updates of Dm(k), the updated spectrum is again smoothed by three point moving average, and lastly the main maximum of the spectrum is identified and are suppressed [7]. III. WEIGHTED NOISE SUBTRACTION (NWNS) Noise reduction based on implementation of the traditional spectral subtraction (SS) require an available estimation of the embedded noise, here, in time domain we named Noise Subtraction (NS). It is observed that, in NS, degradation occurs for overestimation of noise within the unvoiced region of noisy speech at higher input SNR (>10 dB). We manually seen that the unvoiced region provides flat spectrum characteristics and exhibits low SNR that gives more degree of noise value that increases the noise level. Therefore, the extracted noise in unvoiced region is high and degrades the speech. From Figure 4, it is seen that the unvoiced frame of higher SNR (>10 dB) input noisy speech provides flat spectrum and low SNR that gives more DON2 (Z2m) value that increases weighting factor. So more noise has subtracted at every unvoiced frame than from every voiced frame, say at 25 dB SNR input speech. Consequently speech distortion has to be occurred. For that, we

Figure 4. The depictions spectrums of voiced and unvoiced frames degraded by white noise at 5dB SNR is shown in (a) and (b), 10dB SNR is shown (c) and (d), 20dB SNR is shown in (e) and (f), 30dB SNR is shown in (g) and (h) respectively.

A.

Derievation of non linear weight It is observed that the outcome of the subtraction type algorithms produce musical noise and that cannot be avoided. Since algorithms with fixed subtraction parameters are unable to adapt well to the varying noise levels and characteristics, therefore it becomes imperative to estimate a suitable factor to update the noise level. Hence we derive a nonlinear weighting factor α for this purpose. First, simulation is performed over 7 males and 7 females speakers of different sentences at different SNR levels, randomly selected from the TIMIT database, for different values of α and record the output SNR.

39

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

Table 1 shows the performance of computer simulation of the algorithm of a given noisy sentence of a female speaker for different values of α.

TABLE 1: The output SNR for a noisy speech of a female speaker for different values of α. for wide range of input SNR (-10dB to 30dB). The speech is degraded by white noise nose.

SE = ∑ ei2

i =1

9

We consider α is a polynomial of degree 3. Then the 3rd degree polynomials are:

α = f ( x) = a0 + a1 x + a2 x 2 + a3 x 3 }

be fitted to the data points (xi , yi), i = 1,2,….,9. x represents the values of DON2. The summation of errors at x = xi is given by

9

(7)

SE ≡ ∑ yi − a0 + a1 xi + a 2 xi + a3 xi

2 i =1

[ (

3

)]

2

(8)

**For SE to be minimum, we have 2 9 ∂(SE) = −2∑ [yi − (a0 + a1 xi + a2 xi2 + a3 xi3 )] = 0 ∂a0 i =1
**

9 ∂ ( SE ) = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3 ∂a1 i =1 9 ∂( SE) = −2∑ yi − a0 + a1 xi + a2 xi2 + a3 xi3 ∂a2 i =1

(9) (10)

[ (

(

(

)] x

2

2

i

=0 =0

[

[

)] x

)] x

2

2 i

(11)

=0

**∂ ( SE ) = −2∑ yi − a 0 + a1 xi + a 2 xi2 + a3 xi3 ∂a3 i =1
**

9

3 i

(12)

From Eq. (9), (10), (11) and (12) we have,

∑y

TABLE 2: The average weight of α for 7 male and 7 female utterances corresponding to wide range of input SNR (-10dB to 30dB).

i =1

9 i =1 9 i

9

i

= 9 a 0 + a1 ∑ x i + a 2 ∑ x i2 + a 3 ∑ x i3

i =1 i =1 i =1

9 i

9

9

9

(13) (14) (15) (16)

∑x y ∑x

i =1 2 i

= a 0 ∑ xi + a1 ∑ x + a 2 ∑ x + a3 ∑ x

i =1 9 i =1 9 2 i i =1 9 3 i i =1 9 2 i 3 i 4 i

9

9

9

4 i

y i = a0 ∑ x + a1 ∑ x + a 2 ∑ x + a3 ∑ x

i =1 i =1 i =1 i =1

5 i

∑x

i =1

9

3 i

**yi = a0 ∑ xi3 + a1 ∑ xi4 + a2 ∑ xi5 + a3 ∑ xi6
**

i =1 i =1 i =1 i =1

9

9

9

9

**We write these equations in a matrix form as:
**

⎡ 9 ⎤ ⎡ ⎢∑ yi ⎥ ⎢9 i =1 ⎢ ⎥ ⎢ ⎢ 9 ⎥ ⎢ 9 ⎢ ∑ xi y i ⎥ ⎢∑ xi ⎢ i =1 ⎥ = ⎢ i =1 ⎢ 9 2 ⎥ ⎢ 9 2 ⎢ ∑ xi y i ⎥ ⎢ ∑ xi ⎢ i =1 ⎥ ⎢ i =1 ⎢ 9 3 ⎥ ⎢ 9 3 ⎢ ∑ xi y i ⎥ ⎢ ∑ xi ⎣ i =1 ⎦ ⎣ i =1

∑x ∑x ∑x

i =1 9 i i =1 2 i i =1 i =1 9 2 i 9 i =1 9 3 i 9 i =1 9

9

9

9

3 i

Let the set of data points (xi , yi), i = 1,2,…,9 and the curve given by Y = f(x) be fitted for this data.. At x = xi, the experimental value of the ordinate is yi and the corresponding value on the fitting curve is f(xi). If ei is the error of approximation at x = x , then ei = yi − f ( xi ) , then the summation of the square of the errors is given by

i

∑x ∑x ∑x

i =1 9 3 i i =1 9 4 i i =1 9

4 i

∑x ∑x ∑x ∑x ∑x ∑x

i =1 4 i i =1 5 i i =1

5 i

6 i

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎡a0 ⎤ ⎢ ⎥ ⎢ ⎥ ⎢a1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢a 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢a ⎥ ⎢ 3⎥ ⎢ ⎥ ⎣ ⎦

(17)

40

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Eq.(17) is a Vander monde matrix. We can also obtain the matrix for a least squares fit by writing:

⎡ y1 ⎤ ⎡1 ⎢ ⎥ ⎢1 ⎢ y2 ⎥ ⎢ ⎢ y3 ⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ y 4 ⎥ ⎢1 ⎢ y ⎥ = ⎢1 ⎢ 5⎥ ⎢ ⎢ y ⎥ ⎢1 ⎢ ⎥ ⎢ ⎢ y7 ⎥ ⎢1 ⎢y ⎥ ⎢ ⎢ 8 ⎥ ⎢1 ⎢ y9 ⎥ ⎢1 ⎣ ⎦ ⎢ ⎣ x1 x2 x3 x4 x5 x6 x7 x8 x9 x12

2 x2 2 x3

x x

2 4

2 5

2 x6 2 x7 2 x8 2 x9

x13 ⎤ ⎥ 3 x2 ⎥ 3 x3 ⎥ ⎥ 3 x4 ⎥ ⎥ 3 x5 ⎥ 3⎥ x6 ⎥ 3 x7 ⎥ ⎥ 3 x8 ⎥ ⎥ 3 x9 ⎥ ⎦

⎡ ⎤ ⎢a ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢a1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢a2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

(18)

⎡1 ⎢ ⎢1 ⎢1 ⎢ ⎢1 ⎢ X = ⎢1 ⎢ ⎢1 ⎢1 ⎢ ⎢1 ⎢ ⎢1 ⎣

DON 21 DON 2 2 DON 2 3 DON 2 4 DON 2 5 DON 2 6 DON 2 7 DON 2 8 DON 2 9

2 DON 21

DON 2 2 2

2 DON 2 3

DON 2 2 4

2 DON 2 5 2 DON 2 6 2 DON 2 7 2 DON 2 8 2 DON 2 9

3 DON 21 ⎤ ⎡α 1 ⎤ ⎥ ⎢α ⎥ DON 2 3 ⎥ 2 ⎢ 2⎥ 3 ⎥ DON 2 3 ⎥ ⎢α 3 ⎥ ⎢ ⎥ DON 2 3 ⎥ 4 ⎢α 4 ⎥ ⎥ 3 DON 2 5 ⎥ and Y = ⎢α 5 ⎥ ⎢ ⎥ ⎥ DON 2 3 ⎥ ⎢α 6 ⎥ 6 ⎢α ⎥ 3⎥ DON 2 7 ⎢ 7⎥ ⎥ 3 ⎢α 8 ⎥ DON 2 8 ⎥ ⎢ ⎥ ⎥ 3 ⎣α 9 ⎦ DON 2 9 ⎥ ⎦

Finally we put the values of DON21,….DON29 to get X and put the value of α1,…..,α9 to get Y. Therefore, from Eq.(21), we have

**In matrix notation, Eq.(18) can be written as: Y =XA where,
**

⎡1 ⎡ y1 ⎤ ⎢ ⎢y ⎥ ⎢1 ⎢ 2⎥ ⎢1 ⎢ y3 ⎥ ⎢ ⎢ ⎥ ⎢1 ⎢ y4 ⎥ Y = ⎢ y5 ⎥ X = ⎢1 ⎢ ⎢ ⎥ ⎢ ⎢ y6 ⎥ ⎢1 ⎢y ⎥ ⎢1 ⎢ 7⎥ ⎢ ⎢ y8 ⎥ ⎢1 ⎢ ⎥ ⎢ y9 ⎦ ⎣ ⎢1 ⎣ x1 x2 x3 x4 x5 x6 x7 x8 x9 x12

2 x2 2 x3

(19)

**⎡a 0 ⎤ ⎡0.3019 ⎤ ⎢a ⎥ ⎢ 6.4021 ⎥ 1 ⎥ A=⎢ ⎥=⎢ ⎢a 2 ⎥ ⎢− 14.109⎥ ⎢ ⎥ ⎢ ⎥ ⎣a3 ⎦ ⎣9.8273 ⎦
**

Now substitute the value of

a0 , a1 , a2

and a3

in Eq.(7)

x x

2 4

2 x5 2 6

2 x7 2 x8 2 x9

x13 ⎤ ⎥ 3 x2 ⎥ 3 x3 ⎥ ⎥ 3 x4 ⎥ ⎥ 3 x5 ⎥ 3⎥ x6 ⎥ 3 x7 ⎥ ⎥ 3 x8 ⎥ ⎥ 3 x9 ⎥ ⎦

⎡ ⎤ ⎢a ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢a1 ⎥ ⎢ ⎥ and A = ⎢ ⎥ ⎢a 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

T

α = 0.3019 + 6.4021× DON 2 − 14.109 × DON 22 + 9.8273 × DON 23 (22)

Equation (22) is the derivation of the nonlinear weighting factor α and is used in Eq. 5. IV. EXPERIMENTAL RESULTS AND DISCUSSION The proposed noise estimation method is compared with the conventional noise estimation algorithm using MVS in terms of noise estimation accuracy and quality. Figures 5 illustrate results of noise estimation in frequency domain (FD) measure. In the experiment, we consider the vowel phoneme sound /oy/, degraded by the white noise at 0dB SNR. It shows that, by adopting the proposed DON1 (Z1m), it is possible to estimate the state of the added noise more precisely. We achieve sufficient improvements in noise amplitudes using the MVS+DON1 estimator. Objective measure is also performed to verify the quality of the estimated noise. For that we use the PESQ MOS measure. Figure 6 shows the PESQ MOS value between the added and the estimated noise at different noise levels. It shows that PESQ MOS value gradually decreases at the higher SNR. To study the speech enhancement performance, an experiment is carried out by taking 56320 samples of the clean speech /she had your dark suit in greasy wash water all year/ from TIMIT database. The speech signal is corrupted by white, pink and HF channel noises at various SNR levels are taken from NOISEX database. The results of the average output SNR obtained from for white noise, pink noise and HF channel noise at various SNR levels are given in Table 1 for NS and NWNS, respectively.

Multiply both sides of Eq.(19) by X (transpose of X)

X T Y = X T XA

(20)

This matrix equation can be solved numerically, or can be inverted directly it is well formed, to yield the solution vector

A= XTX

(

)

−1

X TY

(21)

**In our experiment,
**

x i = DON 2 i = 0.80707 , 0.75235 , 0 .59967 , 0.32374 , 0 .095033 , 0 .01379 , − 0 .009902 , − 0.01741 , − 0 .019616

yi = α i = 1.4, 1.35, 1.25, 1.1071, 0.86429, 0.56429, 0.26429, 0.11571, 0.035

[

[

]

]

So, we have

41

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Am plitude (dB) 20 0 -20 -40 0 1 2 3 4 5 6 7 8 True noise Est. noise by MVS

**Frequency (kHz) Amplitude (dB)
**

20 0 -20 -40 0 1 2 3 4 5 6 7 8 True noise Est. noise by MVS+DON1

Frequency (kHz)

amplitudes by the estimated DON1. It eliminates the need for a VAD by exploiting the short time characteristics of speech signals. In the result part, it is shown that the state of the added noise is more accurate with MVS+DON1. The enhanced speech using time domain nonlinear weighted noise subtraction results in sufficient noise reduction. The main advantage of the algorithm is the effective removal of the noise components for a wide range of SNRs. We not only have better SNR but also a better speech quality with significantly reduced residual noise. However, a little noisy effect still remains. This issue will be addressed in our future study. REFERENCES

Benesty, J., Makino, S., and Chen, J., Speech Enhancement, SpringerVerlag Berlin Heidelberg, 2005. [2] Martin, R., and Lotter, T., “Optimal Recursive Smoothing of NonStationary Periodograms”, Proc. IWAENC, pp. 167-170, Sept. 2001. [3] Cohen, I., and Berdugo, B., “Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement”, IEEE Signal Processing Letters, vol. 9, no. 1, pp. 12-15, Jan. 2002. [4] Cohen, I., “Noise Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging”, IEEE Trans. on Speech and Audio Process., vol. 11, pp. 466-475, Sept. 2003.. [5] Doblinger, G., “Computationally Efficient Speech Enhancement by Spectral Minima Tracking in Subbands”, Proc. EUROSPEECH, pp. 1513-1516, 1995. [6] Hirsch, H. G., and Ehrlicher, C., “Noise Estimation Methods for Robust Speech Recognition”, Proc. ICASSP, pp. 153-156, 1995. [7] Hamid, M. E., Ogawa, K., and Fukabayashi, T., “Noise estimation for Speech Enhancement by the Estimated Degree of Noise without Voice Activity Detection”, Proc. SIP 2006, pp. 420-424, Hawaii, August 2006. [8] Martin, R., “Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors”, in Proc. Int. Conf. Speech, Acoustics, Signal Processing, vol. I, pp. 253–256, 2002. [9] Martin, R., “Spectral Subtraction Based on Minimum Statistics”, Proc. EUSIPCO, pp. 1182-1185, 1994. [10] Martin, R., “Statistical Methods for the Enhancement of Noisy Speech”, Proc. IWAENC2003, pp. 1-6, 2003. [1]

**Figure 5. Noise spectrums (original and estimated).
**

4 3.5

ISD ta c is n e

3 2.5 2 1.5 1 0.5 0 -10 -5 0 5 10 15 20 25 30

SNR (dB)

Figure 6. Estimated noise quality based on PESQ MOS.

We observe from the Tables 3 that the overall output SNR by NS is improved up to 10 dB input SNR and degraded from 15 dB and higher. Degradation occurs for overestimation of noise within the unvoiced region of noisy speech at higher input SNR (>10 dB). Since the unvoiced region provides flat spectrum characteristics and exhibits low SNR gives more DON2 value that increases the noise level. Consequently the extracted noise in unvoiced region is high that is responsible to degrade the speech. Hence it is essential to add a weighting factor to control the overestimation and we have a better performance by NWNS throughout the SNR. It is observed that the enhanced speech is distorted in low voiced parts due to remove the noise in NS method whereas NWNS does not. But little amount of noise can be removed from the corrupted speech by NWNS method. So in NS method there is a loss of speech intelligibility while NWNS maintains it. We have found better results compared to our previous study [7] for a wide range of SNRs.

TABLE 3: The results of average output SNR for various types of noise at different input SNR by the NS and NWNS methods. Input SNR -10dB -5dB 0dB 5dB 10dB 15dB 20dB 25dB 30dB White noise NS NWNS -2.8 -1.57 2.0 2.4 6.5 5.3 10.3 8.7 13.3 11.7 15.4 15.8 16.7 20.4 17.5 25.2 17.7 30.1 HF channel noise NS NWNS -7.4 -7.5 -2.3 -2.7 2.6 1.9 7.3 6.4 11.5 10.8 14.5 15.4 16.4 20.2 17.3 25.1 17.7 30.1 Pink noise NS NWNS -7.1 -7.1 -2.2 -2.3 2.6 2.2 7.3 6.4 11.3 10.8 14.4 15.4 16.3 20.3 17.3 25.2 17.6 30.1

AUTHORS PROFILE Md. Ekramul Hamid received his B.Sc and M.Sc degree from the Department of Applied Physics and Electronics, Rajshahi University, Bangladesh. After that he obtained the Masters of Computer Science degree from Pune University, India. He received his PhD degree from Shizuoka University, Japan. During 1997-2000, he was a lecturer in the Department of Computer Science and Technology, Rajshahi University. Since 2007, he has been serving as an associate professor in the same department. He is currently working as an assistant professor in the college of computer science at King Khalid University, Abha, KSA. His research interests include speech enhancement, and speech signal processing.

CONCLUSIONS In this paper, an improved noise estimation technique is discussed. Initially noise is estimated from the valleys of the amplitude spectrum. Then we have adjusted the estimated noise

42

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Md. Zasim Uddin received his Bsc and MSc in Computer Science & Engineering from Rajshahi University, Rajshahi, Bangladesh. He has been awarded National Science and Information & Communication Technology Fellowship (Government of the People's Republic of Bangladesh) in 2009. Currently he is a lecturer of Computer Science & Engineering department, Dhaka International University, Dhaka, Bangladesh. His research interests include medical image and signal processing. He is a member of Bangladesh Computer Society. Md. Humayun Kabir Biswas, working as an international lecturer in the Department of Computer Science at King Khalid University, Kingdom of Saudi Arabia. Before joining at KKU, he worked as a lecturer under the Department of Computer Science and Engineering at IUBATInternational University of Business Agriculture and Technology, Bangladesh. He has completed his Master of Science in Information Technology degree from Shinawatra University, Bangkok, Thailand. He is keen to doing research on semantic web, intelligent information retrieval technique and Machine Learning Technique. His current research interest is audio and image signal processing.

Somlal Das received B.Sc (Hons) and M.Sc. degrees from the Department of Applied Physics and Electronics in the University of Rajshahi, Bangladesh. He joined as a lecturer at the Department of Computer Science and Engineering in the University of Rajshahi, Bangladesh, in 1998. He is currently serving as an Assistant Professor and working as Ph.D. student at the same Department. His research interests are in speech signal processing, speech enhancement, speech analysis, and digital signal processing.

43

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

An improved method for noise estimation of speech utterances which are disturbed by additive noise is presented in this paper. Here, we introduce degree of noise refinement of minima value sequence...

An improved method for noise estimation of speech utterances which are disturbed by additive noise is presented in this paper. Here, we introduce degree of noise refinement of minima value sequence (MVS) and some additional techniques for noise estimation. Initially, noise is estimated from the valleys of the spectrum based on the harmonic properties of noisy speech, called MVS. However, the valleys of the spectrum are not pronounced enough to warrant reliable noise estimates. We, therefore, initially use the estimated Degree of Noise (DON) to adjust the MVS level. For every English phoneme DON is calculated and averaged within those processing frames for the each input SNR. We consider this calculated average DONs as standard value corresponding to the input SNR which is aligned with the true DON using the least-squares (LS) method results a function to estimate the degree of noise. Therefore, using the technique, it is possible to estimate the state of the added noise more accurately. We use two stage refinements of estimated DON to update the MVS as well as to estimate a nonlinear weight for noise subtraction. The performance of the proposed noise estimation is good when it is integrated with the speech enhancement technique.

- Gudbjartsson Patz PMC 1995by sakaie7005
- Comparative Analysis of Adaptive Recursive Filters For Noise Reduction in C-Arm Fluoroscopic X-Ray Imagesby International Journal for Scientific Research and Development
- Ultrasonic Inspection of Materials With Coarse Grain Anisotropic Structuresby abanzabal
- Stochastic Resonanceby Riccardo Mantelli

- paper ID-28201448
- Asr2000 Final Footer
- Noise Factor
- factor Q
- Adaptive SNR Filtering Technique for Rician Noise Denoising in MRI-2013
- Noise In Communication
- bang 1 685-688.pdf
- Performance Analysis of New Proposed Window For
- EECE 522 Notes_08 Ch_3 CRLB Examples in Book
- JAE-Vol_23-2005
- Add & Mult Noises
- tmpFC57
- L6_Beamforming
- R-REC-SM.331-4-197807-I!!PDF-E
- A Novel Approach for Underwater Image Enhancement
- Ch5
- 3 Equalization
- 14-MatchedFilter
- Module 9
- 82-50-1-SM
- Gudbjartsson Patz PMC 1995
- Comparative Analysis of Adaptive Recursive Filters For Noise Reduction in C-Arm Fluoroscopic X-Ray Images
- Ultrasonic Inspection of Materials With Coarse Grain Anisotropic Structures
- Stochastic Resonance
- Performance Evaluation of Cognitive Radio Spectrum Sensing Using Multitaper-singular Value Decomposition
- Comparison of Field and Manufactured Flaw Data in Austenitic Materials
- 11566807
- Rls Matlab
- 20120140503002
- Clear Voice Denoiser
- A New Noise Estimation Technique of Speech Signal by Degree of Noise Refinement

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd