Professional Documents
Culture Documents
Speech Probability Distribution
Speech Probability Distribution
7, JULY 2003
Fig. 3. test result over a time interval (200 ms) for four pdf hypotheses. jj
Fig. 4. Moment test ( = E [ x ]= E [x ]) value for a speech signal and the
nominal values for different hypotheses over time intervals of 200 ms.
(a)
(b)
Fig. 6. PDF of three dominant speech signal components in the adaptive KLT
domain proposed in [9]. (Solid line) Experimental. (Dashed line) LD. (c)
Fig. 7. Short-time (200 ms) test results for KLT coefficients.
only the first coefficient (i.e., the discrete cosine (DC) compo- where and denotes the
nent) is better approximated by the GD hypothesis. We must Gamma function, and and are positive real parameters.
note that this very small low-frequency component is not rep- The shape parameter describes the exponential rate of
resentative of a speech signal, because the DC component is decay; is the square root of the variance. If , the
usually blocked in signal acquisition and represents just noise above is an LD, and if , it is a GD. The maximum-like-
measurement. We conclude from Fig. 8(b) and (c) (as well as lihood estimation of the GGD parameters can be found in
from all other components) that the DCT coefficients of speech [8]. The shape parameter of a GGD can be estimated by
are better approximated by LD. This is to say that speech in the where , , , and
DCT domain also exhibits an LD, with the result that a multi- [7]. The calculation of
variate LD for speech is more accurate than a Gaussian one. is complex and time consuming in real-time applications such
as speech processing. As we can easily show that is a
monotonic function increasing for for the interval of interest,
V. DISCUSSION AND CONCLUSION
we suggest using instead of as a moment value to perform
In previously published results, the distribution of the speech the test.
signal is claimed to have a -D over long time periods [1]–[3], This test evaluates a sample set and finds its best pdf in the
while in some other papers a GD is used. The speech is a spher- class of GGD functions. Under the hypothesis that the shape
ically invariant random process for time intervals less than 2 ms parameter is close to one, the pdf is Laplacian, and if is
[4]. We first demonstrate that a speech signal during voice ac- close to two, the pdf is Gaussian. Thus, using instead of
tivity intervals may be characterized by LD in the time domain. as a decision parameter, the value of is to be compared with
Then two different tests are used to compare widely known and , to verify whether the sample
speech distributions over short time periods. Both tests favor LD set should be considered an LD or GD. In our simulations, the
during voice activity intervals. Tests performed in the KLT and case is also considered where . The
DCT domains lead us to conclude that all major and meaningful moment value for the Gamma distribution is .
components of voiced speech show LD.
In decorrelated transformed domains such as the KLT and
ACKNOWLEDGMENT
DCT, speech components are uncorrelated. Our experimental
results suggest a multivariate Laplacian pdf for speech during Authors would like to thank Associate Editor, the anonymous
its activity intervals. This pdf is simple to use and fits better than reviewers, and H. Warder for their comments and suggestions.
other widely used pdfs in many speech processing applications.
In contrast, similar tests performed on noise gathered from
different acoustic environments clearly illustrate that most of REFERENCES
them are better described by a GD. During silent intervals, both
criteria (i.e., and ) favor the GD hypothesis. [1] W. B. Davenport, “An experimental study of speech wave probability
distributions,” J. Acoust. Soc. Amer., vol. 24, no. 4, pp. 390–399, July
We also noted that these tests, for those frames containing 1952.
both speech and silence (on the edges), favor -D similar to the [2] D. L. Richards, “Statistical properties of speech signals,” Proc. Inst.
expected results from a random mixture selection of a GD and Elect. Eng., vol. 111, no. 5, pp. 941–949, 1964.
an LD. This shows that the difference between our results and [3] M. D. Paez and T. H. Glisson, “Minimum-mean squared-error quantiza-
tion in speech PCM and DPCM,” IEEE Trans. Commun., vol. COM-20,
previously published results [1]–[6] is caused by the inclusion in pp. 225–230, Apr. 1972.
previous studies of silent samples (i.e., low-power noise, which [4] H. Brehm and W. Stammler, “Description and generation of spherically
is well described by a GD) mixed with active speech (which is invariant speech-model signal,” Signal Process., vol. 12, no. 2, pp.
119–141, Mar. 1987.
described by an LD) (see Fig. 5).
[5] J. P. LeBlanc and P. L. De Leòn, “Speech separation by kurtosis maxi-
APPENDIX A mization,” in Proc. ICASSP, vol. 2, 1998, pp. 1029–1032.
TEST [6] G.-J. Jang, T.-W. Lee, and W.-H. Oh, “Learning statistically efficient
features for speaker recognition,” in Proc. ICASSP, 2001.
The test compares experimental data with some given pdfs [7] R. L. Joshi and T. R. Fischer, “Comparison of generalized Gaussian and
and measures the distortion between the pdfs and data by Laplacian modeling in DCT image coding,” IEEE Signal Processing
where the sample space is partitioned Lett., vol. 2, pp. 81–82, May 1995.
into intervals; is the number of samples that fall into the th [8] F. Müller, “Distribution shape of two-dimensional DCT coefficients of
natural images,” Electron. Lett., vol. 29, no. 22, pp. 1935–1936, Oct.
interval; is the theoretical probability that a sample will fall 1993.
into the same th interval; and is the total number of samples. [9] A. Rezayee and S. Gazor, “An adaptive KLT approach for speech en-
The smaller the value, the better the fit. hancement,” IEEE Trans. Speech Audio Processing, vol. 9, pp. 87–95,
Feb. 2001.
[10] J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice ac-
APPENDIX B tivity detection,” IEEE Signal Processing Lett., vol. 6, pp. 1–3, Jan.
MOMENT TEST FOR GGD AND -D 1999.
[11] J. Huang and Y. Zhao, “A DCT-based fast signal subspace technique for
Assume that the pdf of the speech signal is modeled with robust speech recognition,” IEEE Trans. Speech Audio Processing, vol.
the zero-mean GGD function 8, pp. 747–751, Nov. 2000.
[12] J. Bell and T. J. Sejnowski, “An information-maximization approach to
blind separation and blind deconvolution,” Neural Comput., vol. 7, pp.
1129–1159, 1995.
Authorized licensed use limited to: the Leddy Library at the University of Windsor. Downloaded on December 08,2023 at 16:37:02 UTC from IEEE Xplore. Restrictions apply.