100524

132 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010
Analysis of Subband Speech Enhancement

Technique for Digital Hearing Aids
D. Deepa1, Dr. A. Shanmugam2
Bannari Amman Institute o Technology, Sathyamangalam, Tamilnadu, India
1
deepa_dhanaskodi@yahoo.co.in, 2 dras@yahoo.co.in
Abstract: Hearing impairment is the number one chronic subtraction is updated by averaging the noise speech
disability affecting people in the world. Many people have great spectrum using a time and frequency dependant smoothing
difficulty in Understanding speech with background noise. factor, which is adjusted based on signal presence
Speech Enhancement plays a vital role in such situations. probability in sub bands. Signal presence is determined by
Random noise will disturb the signal between speaker and the computing the ratio of the noisy speech power spectrum to
listener. Therefore the background noise has to be removed its local minimum, which is computed by averaging past
from the noisy speech signal to increase the signal intelligibility values of the noisy speech power spectra with a look-ahead
and to reduce the listener fatigue. The proposed approach is a
factor. This local minimum estimation algorithm adapts
speech enhancement method based on the preprocessed sub
band spectral subtraction method, and the preprocessing is done
very quickly to highly non stationary noise environments.
by using partial differential equation. This method provides a By using this noise estimation algorithm in this method
greater degree of flexibility and control on the noise subtraction outperforms the standard power spectral subtraction method
levels that reduces artifacts in the enhanced speech, resulting in resulting in superior speech quality and largely reduced
improved speech quality and intelligibility. musical noise in single channel system for both stationary
and non stationary noise environments.
Key words- Sub band Spectral Subtraction, Partial differential
equation, Adaptive noise estimation, Signal to Noise ratio, IS
distance. 2. Partial Differential Equation Technique
1. Introduction Input noisy speech signal affected by additive background
noise can be enhanced by this method. First step in speech
Understanding speech with background noise is difficult
enhancement using PDE is to obtain the gradient (g) of each
even for normal hearing people, for the people with hearing
sample in noisy speech signal using the samples before and
disability it is one of the major problem which is to be
after the current sample.
reduced. This is especially true for a large number of elderly
people and the sensorineural impaired persons. Several g f = S ( x − ∆x, t ) − S( x, t )
--------- (2.1)
investigations on speech intelligibility have demonstrated g b = S( x + ∆x, t ) − S ( x, t )
that subjects with sensorineural loss may need a 5-15 dB
Where, S(x,t) is the noisy speech signal, Δx is the
higher signal-to-noise ratio than the normal hearing
sampling rate. After the gradient is calculated the
subjects. Recent statistics of the hearing impaired patients
influencing coefficients in each directions of the current
applying for a hearing aid reveal that 20% of the cases are
sample are computed.
due to conductive losses, more than 50% are due to
1
sensorineural losses, and the rest 30% of the cases are of IC f = 2
mixed origin. Presenting speech to the hearing impaired in gf 
an intelligible form remains a major challenge in hearing- 1 +  
aid research today. In single channel system, speech  k  ---------- (2.2)
enhancement is a challenging on because reference noise 1
IC b =
signal will not be available for enhancement. The clean 2
g 
speech cannot be processed prior to being affected by the 1+  b 
noise. This is one of the most difficult situations in speech  k 
enhancement for a single channel system. The conventional Where, ICf is the Forward influencing coefficient and ICb
power spectral subtraction method for single channel speech is the backward influencing coefficient. In the equation (2)
enhancement substantially reduces the noise levels in the ‘k’ is constant value between 1 and 100. From the above
noisy speech but this will introduce musical noise. calculated influencing coefficients and gradients the speech
The proposed method is a preprocessed sub band signal is enhanced using
spectral subtraction method of speech enhancement and the
preprocessing technique is using partial differential S ( x, t + ∆t ) = S ( x, t ) + ∆t ( g f IC f + g b ICb ) ------- (2.3)
equation. In this method first the input noisy speech signal In the above equations S(x,t) is the input noisy speech
is enhanced using PDE by taking the adjacent samples and signal, Δt is a coefficient between 0.1 t0 0.6 representing the
calculating the gradient, influencing coefficients and then step of noise reduction in each iteration. The output of the
the output of this process is applied to the input for sub band signal is again processed by applying into the algorithm of
spectral subtraction method. The noise estimate in spectral
(IJCNS) International Journal of Computer and Network Security, 133
Vol. 2, No. 5, May 2010
Sub band spectral subtraction method to reduce the non contain only background noise. The noise estimate can be
stationary noise. updated by tracking those noise only frames. To identify
those frames, a simple procedure is used which calculates
the ratio of noisy speech power spectrum to the noise power
3. Sub band Spectral Subtraction method spectrum at 3 different frequency bands in each frame and
Sub band Spectral subtraction method is the the sampling frequency respectively. If all the three ratios
frequency dependent processing of the Spectral Subtraction are smaller than the threshold that frame is concluded as a
procedure. This method offers better quality [9] of the noise only frame, otherwise , if any one or all the ratios are
enhanced speech with reduced residual noise. This approach greater than threshold that frame is considered as speech
has been justified due to variation in signal to noise ratio present frame.
across the speech spectrum. White Gaussian noise has a flat The noise estimate is updated in speech absent frames
spectrum, where as the real world noise is not flat. The with a constant smoothing factor. In speech present frames
noise spectrum does not affect the speech signal uniformly the noise is updated by tracking the local minimum of noisy
over the whole spectrum; some frequencies are affected speech and the deciding speech presence in each frequency
more adversely than others. bin separately using the ration of noisy speech power to its
To take into account the fact that colored noise local minimum.
affects the speech spectrum differently at various 4.1.2 The minimum of noisy speech
frequencies, we use this approach to spectral subtraction. In our method for tracking the minimum of the
The speech spectrum is divided into non-overlapping noisy speech by continuously averaging past spectral values.
bands, and spectral subtraction is performed independently In this algorithm if the value of the noisy speech spectrum
in each band. The estimate of the clean speech spectrum in in the present frequency bin is greater than the minimum
the ith band is obtained by: value of previous frequency bin then the minimum value is
2 2 2 updated, else the previous value is maintained as it is.
Ŝi (k) = Y(k) − α δ D̂i (k) ; b ≤ k ≤ e
i i i i 4.1.3. Detection of Speech-presence frames
(3.1) The approach taken to determine speech presence
where bi and ei are the beginning and ending frequency in each frequency bin is similar to the method used in [4].
bins of the ith frequency band, a i is the over subtraction Let the ratio of noisy speech power spectrum and its local
factor and δi is the band subtraction factor. Over subtraction minimum be defined as
factor provides a degree of control over the noise subtraction
S (α , k) = P( α , k) -------- (4.2)
level in each band, the use of multiple frequency bands and r Pmin (α , k)
the use of the δi weights provide an additional degree of
This ratio is compared with a frequency dependent
control within each band.
threshold, and if the ratio is found to be greater than the
threshold, it is taken as a speech-present frequency bin else
4. Noise estimation it is taken as a speech-absent frequency bin. This is based on
Noise estimation plays an important role in this the principle that the power spectrum of noisy speech will be
work of speech enhancement. For an efficient noise nearly equal to its local minimum when speech is absent.
estimation algorithm the resultant signal estimation will Hence the smaller the ratio is in (6), the higher the
have great accuracy. probability that it will be a noise-only region and vice versa.
4.1. Adaptive noise-estimation algorithm Note that in [4], a fixed threshold was used in place of
threshold.
Several noise-estimation algorithms have been proposed for From the above rule, the speech-presence
speech enhancement applications [2] [3] [4] [5] [6]. The probability, p(α,K), is updated using the following first-
main drawback of most of the noise estimation algorithms is order recursion:
that they are either slow in tracking sudden increases of p(α,k) = a p(α -1,k) + (1- a ) I (α,k) ------- (4.3)
noise power or that they are over estimating the noise where a is a smoothing constant. Note that the above
energy resulting in speech distortion. recursion implicitly exploits the correlation for speech
In the adaptive method the smoothed power presence in adjacent frames. This may result in slight
spectrum of noisy speech is computed using the following overestimate of the noise spectrum but will not likely have
first-order recursive equation: much effect on the enhanced speech.
P (α , k) = gP(α − 1, k) + (1 − g) Y(α , k) 2 --------(4.1) 4.1.4. Calculation of Smoothing Constants
Using the above speech-presence probability
estimate, we compute the time–frequency dependent
where P(α,k) is the smoothed power spectrum, α is the smoothing factor as follows [4].
frame index, k is the frequency index, |Y(α,K)| 2 is the short- a(α,K) = d + ( 1- d) p(α,k) ------------ (4.4)
time power spectrum of noisy speech and g is a smoothing where d is a constant. Note that a(α,K) takes values in the
constant. The proposed algorithm is summarized in the range of d ≤ a(α,K) ≤ 1.
following steps. 4.1.5 Updation of noise spectrum estimate
4.1.1 Speech present and speech absent frames. Finally, after computing the frequency- dependent
In any speech sentence there are pauses between smoothing factor a(α,k), the noise spectrum estimate is
words which do not contain any speech, those frames will updated as
134 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 5, May 2010
D(α,k) = a(α,k) D(α -1,k) + (1- a(α,k) ) |Y(α,K)| 2 -----(4.5)

where D(α,k) is the estimate of the noise power spectrum.
Finally the noise spectrum estimate is updated using the
time–frequency dependent smoothing factor.
For each frequency band this estimated noise is Figure 1 b. Noisy Car Signal
subtracted from the input noisy speech signal, finally the
output of each band is combined by using OLA method to
get an estimate of clean speech (Enhanced speech).
5. Objective measures for performance Figure1 c. Enhanced Car Signal

evaluation Sample 2:
Objective measures are based on a mathematical
comparison of the original and processed speech signals. It
is desired that the objective measures be consistent with the
judgement of the human perception of speech [7]. However,
it has been seen that the correlation between the results
Figure 2 a. Clean speech
obtained by objective measures are not highly correlated
with those obtained by subjective measures. The signal-to-
noise ratio (SNR) and the Itakura-Saito (IS) measure are
two of the most widely used objective measures.
5.1 Signal-to-noise ratio (SNR)
Figure2 b. White stationary Noisy speech
The SNR is a popular method to measure speech
quality. As the name suggests, it is a calculated as the ratio
of the signal to noise power in decibels. If the summation is
performed over the whole signal length, the operation is
called global SNR.
 2  Figure 2 c. Enhanced speech
 ∑ S (n) 
SNR = 10log   n  ----------(5.1)
[ ]
dB 10 2 Frequency Domain Analysis:
 ∑ S(n) − Ŝ (n)  Power Spectral Density of Reference Clean signal,
n  Noisy Signal and the Enhanced Signals were obtained and
5.2 Itakura - Saito Distance compared
PSD COMPARISON
0.7
NOISY SIGNA L
Itakura - Saito measure is one of the distance measures. 0.6
ENH ANC ED SIGNA L
da ta3
Lower the IS distance, better will be the quality of speech 0.5
(i.e. minimum phase difference between clean and enhanced

e
0.4
m litu
p d
signal). The average Itakura-Saito measure across all speech

A
0.3
frames of the given sentence will be computed to evaluate 0.2
0.1
the spectral noise subtraction algorithm.
0
0 20 40 60 80 100 120 140
Time
(a − b)r R (a − b)
d (a , b ) = ----------(5.2)
Figure 3. PSD plot of Cellular noisy, clean and Enhanced
a r Ra
signal
where a is the vector for the prediction Inference for Fig3: Power spectrum density of the enhanced
coefficients of the clean speech signal, vector R is the signal is close to the power spectrum magnitude of clean
autocorrelation matrix of the clean speech signal and vector signal.
b is the prediction coefficients of the enhanced signal. 0.35
PSD Comparis on of White Stationary Nois y s ignal
Noisy Signal
Clean Signal
0.3
Enhanced Signal
6. Experimental Results 0.25
0.2
PD
S
Test samples are taken from SpEAR (Speech 0.15
Enhancement Assessment Resource) database of CSLU 0.1
(Center for Spoken Language Understanding). 0.05
Time domain Results : 0 20 40 60

Fre que ncy
80 100 120 140
Sample 1 Figure 4. PSD comparisons for White Stationary Noisy

signal
Inference: Figure 4 shows the Power spectral density
comparison for White stationary noisy signal. The output
shows that the stationary noise signal in higher frequency
Figure 1 a. Car Clean Signal
range (i.e. 40 to 130) is minimized
(IJCNS) International Journal of Computer and Network Security, 135
Vol. 2, No. 5, May 2010
into account the non-uniform effect of non stationary noise

0.35
PSD COMPARISON
NOISY SIG
CLEAN SIG
NAL
ENHANCED SIG
NAL
NAL
on the spectrum of speech. Signal to noise ratio will be more
than 6 to 10 dB in this method, and can be used in digital
0.3
0.25
0.2
hearing aids for sensorineural loss patients. The added
S
PD
0.15
computational complexity of the algorithm is minimal and it
0.1
adapts with non stationary noise environments. This

0.05
0
0 20 40 60 80 100 120 140
algorithm can be implemented in real-time on a fixed point
Frequency
Digital Signal Processor (DSP) platform for evaluation in

Figure 5. PSD plots of Colored factory Noisy, clean and real-world conditions. Methods can be developed to preserve
Enhanced signal the transitional regions and unvoiced regions, which contain
Inference for Fig 5: Initial Noise segments are very much low speech levels.
reduced in this method for colored factory noisy signal an
dthe spectrum is close to the clean signal. References
S NR O btain e d for diffe re n t n u mbe r of ban ds
50 [1] S. Boll, “Suppression of acoustic noise in speech using

40
30 spectral subtraction,” IEEE Transactions on Acoustics,
20
10 Speech, Signal Processing vol.27, pp. 113-120, Apr.
0
-10 1979.
Single band
2 bands
[2] Y. Ephraim and D. Malah, “Speech enhancement using
4 bands a minimum mean-square error short-term spectral
8 bands
amplitude estimator,” IEEE Transactions on Acoustics,
Figure 6. Improvement in SNR while Speech, Signal Processing vol.ASSP-32, No.6, pp.
increasing number of bands 1109-1121, Dec.1984.
S NR comparison w ith other methods [3] Cohen, I., 2003. Noise spectrum estimation in adverse
50
40
environments: improved minima controlled recursive
30 averaging. IEEE Transactions on Speech Audio
SNR
20
10
Processing. 11 (5), 466–475.
[4] Martin, R., 2001. Noise power spectral density
al
0
l
l
l
al
al
t)
na
na
l
na
gn
ve
gn
na
gn
gn
ig
ig
g
si
ok
sig
si
si
si
si
sy
y
(y
sy
sy
-10
y
e)
is
y
is
is
estimation based on optimal smoothing and minimum

e1
oi
is
no
oi
oi
al
no
no
n
n
no
n
pl
b
db
b
b
fe
m
0d
db
0d
3d
6d
7
sa
y(
ry
r0
ry
is
ry
ry
g
d
na
no
na
tin
tin
la
na
na
statistics. IEEE Transactions on Speech Audio

de
Single band
io
lu
io
rs
rs
io
io
ar
or
el
bu
bu
at
at
at
C
at
ec
4 bands
st
st
st
st
ite
ite
R
ite
ite
k
h
in
in
SS method
W
W
h
Processing (5), 504–512.(2), 1085–1099.

W
Typa of Noise
DEKF Algorihm
Figure 7. SNR comparison with other methods [5] Rangachari, S., Loizou, P., Hu, Y., 2004. A noise
estimation algorithm with rapid adaptation for highly
Table 1: Comparison of IS distance measure for different nonstationary environments. Proceedings IEEE
noisy inputs with conventional approach International Conference on Acoustics, Speech Signal
IS Distance for IS distance for single Processing. I, 305–308.
Signal proposed method band SS method [6] Sohn, J., Kim, N., 1999. Statistical model-based voice
Cellular Noise 1.53 3.04
activity detection. IEEE Signal Processing. Lett. 6 (1), 1–
White burst noise 1.256 2.1722
3.
[7] S. Quackenbush, T. Barnwell, and M. Clements,
White stat noise 0.9978 2.067 “Objective Measures for Speech Quality Testing,”
Pink stat noise 0.791 1.0246 Prentice-Hall, 1988.
[8] J. Li and M. Akagi, “ A noise Reduction system
Car noise 0.1569 1.989
based on hybrid Noise estimation technique and post
Recorded speech 1.54 0 Filtering in arbitrary noise
environments”,Speech Communication, vol.48 pp 111-
Recorded speech 1.2564 0
126, 2006.
We obtained the SNR improvement when the number of
[9] D. Deepa and A. Shanmugam “Time And Frequency
bands is increased is shown in the Fig 6 for various noisy
Domain Analysis Of Subband Spectral Subtraction
signals, Comparison of SNR with other methods is give in
Method Of Speech Enhancement Using Adaptive Noise
Fig 7 and IS distance for the proposed method compared
Estimation Algorithm ” International Journal of Engg.
with conventional SS method in Table 1.
Research & Indu. Appls. (IJERIA)., ISSN 0974-1518,
Vol.2, No. VII (2009), pp 57-72
7. Conclusion [10] D. Deepa and A. Shanmugam “Spectral Subtraction
Method Of Speech Enhancement Using Adaptive Noise
The frequency dependent preprocessed sub band Estimation Algorithm with PDE method as a Pre
spectral subtraction method provides a definite improvement processing technique” ICTACT Journal on
over the conventional power spectral subtraction method Communication Technology (IJCT). ISSN 0976-0091,
and does not suffer from musical noise. The improvement Vol.1, Issue 1, March 2010, pp 1-6
can be attributed to the fact that the sub-band approach takes

100524

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

100524

Uploaded by

Copyright:

Available Formats

132 (IJCNS) International Journal of Computer and Network Security,

Vol. 2, No. 5, May 2010

Analysis of Subband Speech Enhancement

D(α,k) = a(α,k) D(α -1,k) + (1- a(α,k) ) |Y(α,K)| 2 -----(4.5)

5. Objective measures for performance Figure1 c. Enhanced Car Signal

Lower the IS distance, better will be the quality of speech 0.5

(i.e. minimum phase difference between clean and enhanced

signal). The average Itakura-Saito measure across all speech

frames of the given sentence will be computed to evaluate 0.2

6. Experimental Results 0.25

Test samples are taken from SpEAR (Speech 0.15

Enhancement Assessment Resource) database of CSLU 0.1

(Center for Spoken Language Understanding). 0.05

Time domain Results : 0 20 40 60

Sample 1 Figure 4. PSD comparisons for White Stationary Noisy

into account the non-uniform effect of non stationary noise

adapts with non stationary noise environments. This

Digital Signal Processor (DSP) platform for evaluation in

50 [1] S. Boll, “Suppression of acoustic noise in speech using

estimation based on optimal smoothing and minimum

statistics. IEEE Transactions on Speech Audio

Processing (5), 504–512.(2), 1085–1099.

You might also like