Professional Documents
Culture Documents
p
k
,I
k
International Journal of Engineering Trends and Technology (IJETT) Volume 4 Issue 8- August 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 3636
Where A
k
(n) is an estimation of the noisy speech signal
level, B
k
(n) is an estimation of the noise level, I
k
is a
threshold level determining the maximum allowed gain in
subband k and p
k
0 is a constant denoted the gain rise
exponent.
The noisy speech signal level is estimated by taking a
short time average of the input signal according to
A
k
(n) =
k
A
k
(n - 1) +(1 -
k
) |x
k
(n)|
Where 0
k
1 is a forgetting factor constant.
Estimation of the noise level is based on the short time
average A
k
(n) as
B
k
(n) =_
A
k
(n) A
k
(n) B
k
(n1)
(1+
k
)B
k
(n1) Otherwise
where
k
is a positive constant defining the increase rate of
the noise level.
III IMPROVED ADAPTIVE GAIN EQUALIZER
ALGORITHM
During continuous speech, the noise-level estimate
B
k
(n) will increase and cause reduction of the speech
boosting gain. To overcome this problem, an alternative noise
estimation method is proposed. The proposed noise estimator
utilizes a modified update scheme according to the following
equation.
B
k
(n) =
A
k
(n) i A
k
(n) B
k
(n 1)
B
k
(n 1) i A
k
(n) > B
k
(n 1),
onJ (n) =1
(1+[
k
)B
k
(n 1) 0tcrwisc
where (n) is an update controller, which can take the
values as 1 (no update) or 0 (update). Use of the noise
estimation update controller (n) prevents noise estimation
during speech and thus eliminates the problem of speech
boosting gain reduction during intense continuous speech.
The noise estimation update controller is defined as
(n) = _
1 i S
k
(n) I
,k
or ony k
0 otcrwisc
where I
,k
is a threshold and S
k
(n) is the ratio between the
maximumand minimum signal magnitudes in accumulated
blocks defined as
S
k
(n) =
max
q{0,,N
b
-1}
P
k
(I-q)
6+ mIn
q{0,,N
b
-1}
P
k
(I-q)
where N
b
is the number of blocks, used for theestimation of
S
k
(n), 0 <<1 is a constant included for avoiding division by
zero, and F
k
(l) is the accumulated signal block
F
k
(l) = |x
k
(lN
s
i)|
N
s
-1
=0
where Ns is the number of samples that were accumulated in
every block. The block index l E fulfillslNs n.
The essence of (8) is to compare the largest
accumulated signal block (numerator) with the smallest block
(denomi-nator), out of the N
b
most recent (in time) blocks. A
high ratio S
k
(n) indicates that the signal X
k
(n) currently could
be regarded as non-stationary under the given considered
time-frame, meaning in this context is that the current signal
con-tent is likely to be dominated by speech. A low ratio S
k
(n)
on the other hand means that the signal x
k
(n) is likely to be
dominated by stationary (still under the considered time-
frame) noise. The noise estimation update controller then
allows noise estimation once S
k
(n) is below the threshold level
T
j,k
for all k.
A second problem with the original SBA is that if L
k
is set be too high, there is a risk of fast pumping of the noise
and distortion of the speech. To avoid this, while still
providing significant reduction of the noise in speech pauses,
a second gain factor is proposed. This gain factor is denoted as
the fullband gain, g
2
(n), only provides damping, i.e. reduces
the noise, in longer speech pauses and is applied to the input
signal as
Y(n) =g
2
(n) g
1,k
(n)x
k
(n)
K-1
k=0
The proposed fullband gain is based on a gain con-troller
0
(n), which is defined as
0
(n) = ]
1
0
II
1
K
g
1,R
(n)> T
K
R=1
OthcrwIsc
Where T is a threshold. Further, to avoid changes
in g
2
(n) during short speech pauses a hold function
of n
h
samples is introduced for the gain controller
(n) which then becomes
(n) = max
q{0,,n
h
-1}
0
(n q)
The fullband gain is expressed as
g
2
(n) =(n) g
2
(n-1) +(1- (n)) L(n)
where l(n) is the forgetting factor and L(n) is the tar-get
damping value. The speech pause-driven gain g
2
(n) is
designed to quickly adapt to a certain value L
f
with
smoothing parameter l
f
and adapt slowly to the level L
s
<L
f
with a smoothing parameter l
s
>l
f
. The shift between these
regions is decided with
International Journal of Engineering Trends and Technology (IJETT) Volume 4 Issue 8- August 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 3637
L(n) =_
1
I
]
I
s
i] q(n)=1
i] q(n)=0,
cnd g
2
(n-1) > L
]
(1+)
Othcrwisc
And
(n) = _
1
s
i] q(n)=1
i] q(n)=0,
cnd g
2
(n-1) > L
]
(1+)
Othcrwisc
where delta is a small positive constant defining the limit of
transition between the regions of fast and slow damping.
The fullband gain directly depends on the subband gains
g
1,k
; if sufficient gain is applied in the subbands (during
speech), the gain controller phi(n) will be 1, indicating that the
fullband gain should rise. On the other hand, if little subband
gain is applied (when only stationary noise is pre-sent), the
gain controller (n) will be 0, indicating that the fullband gain
should fall.
The fullband gain g
2
(n) could be said to consist of
three regions. The first region, L(n) =1, is used when speech
is present. The second region, L(n) =Lf, is used directly after
a speech segment in the audio signal. In this region, the gain is
quickly reduced, which reduces the noise that is no longer to
be masked by the speech. Since the adaption to the lowest
gain in this region is relatively fast, the amount of noise
suppression cannot be too large since that would give a non-
comfortable sounding alteration of the noise level. Instead, the
third region, L(n) =Ls, is used to adapt to the lowest desired
gain. This adaption is fairly slow in order to make the
transition between the noise levels less apparent.
IV PROPOSED IMPROVEMENTS
The resultant of the IAGE algorithm is speech
enhanced signal which consists of the noisy components. The
improvement provided over in this paper is the signal is
passed through the wavelet transformation.
The general wavelet denoising procedure is follows below:
Apply the wavelet transformto the noisy signal to
produce the noisy wavelet coefficients to the level
which we can properly distinguish the PD
occurrence.
Select appropriate threshold limit at each and every
level and threshold method (hard or soft
thresholding) to best remove the noises.
Inverse wavelets transform of the thresholded
wavelet coefficients to obtain a denoised signal.
Performance measures:
Signal to noise ratio (SNR):
This value gives the quality of reconstructed signal. Higher
the value, the better:
SNR =10 log
10
(
x
2
c
2
)
ox
2
Is the mean square of the speech signal and oc
2
is the
mean square difference between the original and reconstructed
signals.
Normalized Root Mean Square Error:
NRMSE =sqrt[(x(n) r(n))
2
/ (x(n) x(n))
2
]
Where X(n) is the speech signal, r(n) is the reconstructed
signal, and x(n) is the mean of the speech signal.
Peak Signal to Noise Ratio:
PSNR =10 log
10
NX
2
/ Xr
2
N is the length of the reconstructed signal, X is the maximum
absolute square value of the signal x and ||x-r||2 is the energy
of the difference between the original and reconstructed
signals.
Table-1: Values before wavelets
Table-2: Values after wavelets
Outpu
t
Wavef
orms:
Fig (a) Original Signal
SNR PSNR NRMSE
Ece_female 25.9661 5.7213 0.0504
Ece_Male 28.3492 12.9744 0.0383
Eng_female 18.2084 -19.4792 0.1229
Eng_Male 1.7017 -30.8530 0.8221
SNR PSNR NRMSE
Ece_female 29.0212 24.1357 0.0354
Ece_Male 19.3464 11.0572 0.1078
Eng_female 12.9308 -27.3801 0.2257
Eng_Male 3.0662 -28.8702 0.7026
International Journal of Engineering Trends and Technology (IJETT) Volume 4 Issue 8- August 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 3638
Fig (b) Noisy Signal
Fig (c) Enhanced with IAGE
Fig (d) Enhanced using proposed
V CONCLUSION
The noise reduction algorithm presented in this paper
is an improvement to the Improved Gain Equalizer, which
incorporates denoising of speech signal by introducing the
wavelet transformation technique. Which will future reduce
the noise in the signal and also speech will be improved with
the gain factor.
REFERENCES
1. SF Boll, Suppression of acoustic noise in speech using spectral
subtraction.IEEE Trans Acoust Speech Signal Process. 27, 113
120 (1979). doi:10.1109/TASSP.1979.1163209
2. PC Loizou,Speech Enhancement: Theory and Practice(CRC Press,
Taylor &Francis Group, 2007)
3. Z Goh, K-C Tan, BTG Tan, Postprocessing method for
suppressing musical noise generated by spectral subtraction. IEEE
Trans Speech Audio Process.6, 287292 (1998).
doi:10.1109/89.668822
4. Y Ephraim, D Malah, Speech enhancement using a minimum
mean-square error short-time spectral amplitude estimator. IEEE
Trans Acoust Speech Signal Process.32, 11091121 (1984).
doi:10.1109/TASSP. 1984.1164453
5. Y Uemura, Y Takahashi, H Saruwatari, K Shikano, K Kondo,
Musical noise generation analysis for noise reduction methods
based on spectral subtraction and MMSE STSA estimation,
inICASSP 09: Proceedings of the 2009 IEEE International
Conference on Acoustics, Speech and Signal Processing, 4433
4436 (2009)
6. C Plapous, C Marro, P Scalart, Improved signal-to-noise ratio
estimation for speech enhancement. IEEE Trans Acoust Speech
Signal Process.14, 20982108 (2006)
7. N Westerlund, M Dahl, I Claesson, Speech enhancement for
personal communication using an adaptive gain equalizer. Signal
process.85, 10891101(2005). doi:10.1016/j.sigpro.2005.01.004
8. R Flynn, E J ones, Combined speech enhancement and auditory
modelling for robust distributed speech recognition. Speech
Commun. 50, 797809 (2008). doi:10.1016/j.specom.2008.05.004