SPEECH SIGNAL ANALYSIS WITH REALLOCATED SPECTROGRAM

F. Plante(*,l), G. Meyer(+), W . A . Ainsworth(*,+) (*) Communication and Neuroscience Dept. (+) Computer Science Dept. Keele University, Keele, Staffs. ST5 5BG, UK
Tel/Fax : 44 782 583055 Mail: coc05Qkeele.ac.uk (1) Member of GDR 134 of CNRS ”Signal and Image Processing” France.

ABSTRACT
The limited joint time and frequency resolution of Fourier analysis makes an accurate analysis of speech signals difficult. Fourier analysis offers either good temporal accuracy or good frequency resolution, never both. Many methods have been proposed to overcome this limitation. The results do not show a great improvement in the readability of the representation due to the presence of many components in the speech signal. The reallocation defined by Kodera and al. seems a good way to improve the localisation of the spectrogram. Recent work simplified the implementation of this method, which makes it attractive. This paper explores the applicability of this method to the analysis of speech signals. 1.

distribution of energy in the time-frequency window. This technique seemed promising but had little success, in part because this implementation was difficult. Recent work proposes a more efficient algorithm that makes this technique more attractive [SI. We investigate this method and its applicability in the representation of speech signals. 2.

METHODS

INTRODUCTION

The spectrogram is the most popular method for analysing the variation of frequency components of speech signals over time. Narrow band representations, calculated over wide time windows, can be used to extract frequency information accurately while wide band analyses, using short data windows, can be carried out for good time resolution. Conventionally either one or the other of these approaches is used, depending on the problem. A more universal approach is to construct new representations which overcome the limitations of the Fourier transform [l]. The first solution, and the more studied, consists of decomposing the energy of the signal using kernel functions [2]. Cohen gives a generalised formulation of this approach. The Wigner-Ville transformation has a central place in this class [3]. All other representations (eg spectrogram, ChoiWilliams) can be derived from it. The Wigner-Ville transformation gives the best joint resolution, but interference between the components of the signal decreases the readability of its representation. This diminuition is dramatic in the case of speech signals that have many spectral components. Smoothing could be used in order to remove these cross-components. Nevertheless, the results obtained do not increase the resolution in comparison with the spectrogram, and can be worse [4]. A second approach, proposed by Kodera and al. [5], is to compute the short time Fourier transform and modify the point of allocation of the energy in order to emphasize the

In order to give a better localisation of spectrogram energy in the time frequency plane, Kodera and al. [5] propose moving the point of allocation of energy to another point that corresponds better to the energy distribution. In the spectrogram, the energy is allocated to the centre of the time-frequency window, whilst Kodera et al. propose to take the centre of gravity of the energy contained as the point of allocation in the window. The choice of the centre of gravity allows the phase of the Fourier transform, which is not considered in conventional spectrograms, to be taken into account . The new point of allocation (t’,U’)is computed from the derivative of the phase (eq. 1 and 2)

=

1-a @ ( t , u ) 2a

at

with @ the phase of the Fourier transform. Using equations 1 and 2 is not easy and realiable because it is necessary to approximate the derivative of the phase. This is the main reason of not using this method in spite of this theoretical improvment. Recently, Auger and Flandrin [6] show that equation 1 and 2 can be transformed into new formulae (eq.3 and 4). These formulae require the computation of two additional Fourier transforms using the windows, t h ( t ) and d h ( t ) / d t .

with STFh the Short Time Fourier Transform computed with window h. th represents the window multiple by t and dh this derivative according to the time.

0-7803-2127-8/94 $4.0001994 IEEE

640

The results show a improvment in the localisation of energy. as in the narrow band spectrogram. but does not improve the resolution (in terms of the availability of separate close components). The spectrogram (right column) and its reallocated version (left column) of a linear chirp for three window lengths are displayed. medium. Formants and pitch harmonics are also well represented as in narrow band version. the reallocation highlights temporal information of the burst of /b/ that disapeared in the conventional narrow band spectrogram. Temporal events corresponding to the glottal closure are well displayed. Rectangles show the resolution and squares the points of localisation of energy (centre of window) of the spectrogram. The moving is represent by the arrow. - ---- t Figure 1. For the medium band. 3. the reallocation allows the display of frequency information. I t t t t / cc Figure 2. Unresolved components share the same centre of gravity so that the reallocated transformation cannot represent the components separately. In the wide band. the reallocated spectrogram is localised on the intantaneous frequency. with an increase in the temporal accuracy. The reallocated spectrogram is obtained by adding values of all points which are reallocated to the same timefrequency point. Computational aspects Previous examples show the efficiency of this method. The reallocation can be considered as a post-processing of the spectrogram representation. A gaussian window is used.f Time-Frequency Plane t . the F and F1 of /i/ O and /U/ are enhanced in the reallocated version. The explosion of plosive /b/ is better displayed and . For the three windows.2. Black represents the highest energy. The length is symbolised by the length of the line.1. some computational aspects have to be mentioned. The examples show that the reallocated version of the medium-band spectrogram enhances both temporal and spectral information in comparison with the conventional spectrogram.3. the reallocation of medium band spectrogram allows a good temporal resolution without losing frequency resolution. The main difficulty in the use of this method is that EXAMPLES 3. 3. Principle of reallocation. the point of allocation coincides with the instantaneous frequency or the group delay if the window tends to a unit impulse or a constant [6]. The events produced by the closure of vocal cords are also visible. In gray level it is the level of energy (black is the maximum). narrow and wide band spectrograms are plotted for the utterance /eda/. In figure 4 reallocated. as predicted by the theory. Circles represent the new point of reallocation. This is shown in figure 2. This method improves the localisation of the energy in the time-frequency plane. As in figure 3 . The version of reallocated medium band spectrogram allows visualisation of information available in either narrow band or wide band spectrogram. as in the wide band version. 3. as in wide band. Spectrogram (right) and reallocated version (left) of linear chirp for three lengths of window more accurate in time. Figure 1 shows a theoretical improvement that could be obtained with the reallocation. Speech In figure 3 the spectrogram and its reallocated version for medium. In the narrow band. narrow and wide band are plotted for the utterance /ibu/. Nevertheless. The ellipse represents the energy distribution of the signal. It can also be extended to all other time frequency bilinear representations. Linear Chirp The reallocated spectrogram has some interesting properties: in particular.

when low resolution images are computed. In the reallocated version this is not the case. In the case of the spectrogram. The first and material consideration is that three times more memory for storage of the image is required (to keep the coordinates and values of the points). CONCLUSION Reallocation is an interesting method for increasing the localisation of energy in spectrograms. each point is equidistant to its neighbourghs in time and frequency.Reallocated Spectrogram Spectrogram M E D I U M I I _ . Spectrogram (right) and reallocated version (left) of utterance /ibu/ for a medium. Figures show improvements in energy localisation using the reallocation method. N A R R 0 W I I W I D E Figure 3. In case of the reallocated representation this is more complex. Nevertheless. So from a visual point of view. this improvement is limited by the resolution of the screen. 4. narrow and wide band. it is more difficult to interpolate the spectrum between points. and aspects mentioned above. Formant energy 642 . This is quite easy to obtain by interpolation with conventional spectrogram. the uniform distribution of points in the Time-Frequency plane is lost. For a high quality display of the Time-Frequency image. continuity between points is required. This is quite disadvantageous. Secondly. it is more difficult to use the reallocated version than the conventional spectrogram in low resolution.

-t. Crawford M.. Figure 4. Proceedings IEEE. 5. for the utterance /eda/ as well as the timing of events is represented more accurately in the same image. narrow band (bottom left) and wide band (bottom right)... [4] Jones D. Beet S. A comparison with other Time-Frequency representations [7] is intended on a large database to evaluate this method. Colloque TOM. 1 . "Analysis of Time varying signals with small BT values". 1994. Atlas L.. Zhao Y. In the future.. pp 355-361. Comparison of the medium reallocated spectrogram (top left) versus medium band (top right). Crawford M. Gendrin R. 1993. Narrow Band i 643 Medium Band Wide Band '' - . REFERENCES [l]Loughlin P. 1 . pp...77.. Vo1. 4 * L . pp 15. [5] Kodera K. ACKNOWLEDGMENTS This work was supported by Contract SCI-CT92-0786 of the EC Science Programme. Reallocated spectrograms may prove to be even more powerful when used as input to further processing stages. 1989. pp27-53. 1989..Reallocated I . 6.. "Time-frequency distributions .. 1993. [2] Flandrin P. n l . Parks T. Pitton J.1-15. "A Resolution of several Time Frequency Representations".. [3] Cohen L.7. [6] Auger F... . "Temps-Frequence" Eds Hermes.E. IEEE Trans on ASSP. Flandrin P. we intend to test this method for formant tracking and compare results obtained with other classical approaches. de Villedary C. 1978.A review". Vol 34.W. Cooke M... "On the time-frequency display of speech using a generalised time-frequency representation with a cone-shaped kernel" in Visual representations of speech signals eds.L.941-981. "La reallocation : m e methode generale d'amelioration de la lisibilite des representations temps frequence bilineaires". IEEE ICASSP 89. "Advanced timefrequency representations for speech processing" in Visual representations of speech signals eds.W..J. 1993. pp 64-76. Cooke M.. Beet S. It is difficult to judge the improvement of the reallocation by its visual representation because the accuracy is limited by the resolution of the screen.. [7] Wakita H. pp22222225.

Sign up to vote on this title
UsefulNot useful