This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

3, 2011

**Performance Comparison of Speaker Identification using circular DFT and WHT Sectors
**

Dr. H B Kekre1, Vaishali Kulkarni2, Indraneal Balasubramanian3, Abhimanyu Gehlot4, Rasik Srinath5

Senior Professor, Computer Dept., MPSTME, NMIMS University. hbkekre@yahoo.com 2 Associate Professor, EXTC Dept., MPSTME, NMIMS University. Vaishalikulkarni6@yahoo.com 3, 4, 5 students, B-Tech EXTC, MPSTME, NMIMS University. indraneal89@gmail.com, abhimanyu13090@gmail.com, rasik90@gmail.com

1

Abstract— In this paper we aim to provide a unique approach to text dependent speaker identification using transform techniques such as DFT (Discrete Fourier Transform) and WHT (Walsh Hadamard Transform). In the first method, the feature vectors are extracted by dividing the complex DFT spectrum into circular sectors and then taking the weighted density count of the number of points in each of these sectors. In the second method, the feature vectors are extracted by dividing the WHT spectrum into circular sectors and then again taking the weighted density count of the number of points in each of these sectors. Further, comparison of the two transforms shows that the accuracy obtained for DFT is more (80%) than that obtained for WHT (66%). Keywords - Speaker identification; Circular Sectors; weighted density; Euclidean distance

identification using power distribution in the frequency domain [11], [12]. We have also proposed speaker recognition using vector quantization in time domain by using LBG (Linde Buzo Gray), KFCG (Kekre’s Fast Codebook Generation) and KMCG (Kekre’s Median Codebook Generation) algorithms [13 – 15] and in transform domain using DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform) and DST (Discrete Sine Transform) [16]. The concept of sectorization has been used for (CBIR) content based image retrieval. [17] – [21]. We have proposed speaker identification using circular DFT sectors [22]. In this paper, we propose speaker identification using WHT (Walsh Hadamard Transform), and also compare the results with DFT sectors. In Fig. 1, we can see how a basic speaker identification system operates. A number of speech samples are collected from a variety of speakers, and then their features are extracted and stored as reference models in a database. When a speaker is to be identified, the features of his speech are extracted and compared with all of the reference speaker models. The reference model which gives the minimum Euclidean distance with the feature vector of the person to be identified is the maximum likelihood model and is declared as the person identified. II. III. IV. V. VI. VII. EASE OF USE A. Selecting a Template (Heading 2) FF

Figure 1. Speaker Identification System

I.

INTRODUCTION

Human speech conveys an abundance of information, from the language and gender to the identity of the person speaking. The purpose of a speaker recognition system is thus to extract the unique characteristics of a speech signal that identify a particular speaker [1 - 4]. Speaker recognition systems are usually classified into two subdivisions, speaker identification and speaker verification [2 – 5]. Speaker identification (also known as closed set identification) is a 1: N matching process where the identity of a person must be determined from a set of known speakers [7]. Speaker verification (also known as open set identification) serves to establish whether the speaker is who he claims to be [8]. Speaker identification can be further classified into text-dependent and text-independent systems. In a text dependent system, the system knows what utterances to expect from the speaker. However, in a text-independent system, no assumptions about the text can be made, and the system must be more flexible than a text dependent system [4, 5, and 8]. Speaker recognition systems find use in a multitude of applications today including automated call processing in telephone networks as well as query systems such as stock information, weather reports etc. However, difficulties in wide deployment of such systems are a practical limitation that is yet to be overcome [2, 6, 7, 9, and 10]. We have proposed speaker

139

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, 2011

II. A. Discrete Fourier Transform(DFT) The DFT transforms time or space based data into frequency-based data. The DFT allows you to efficiently estimate component frequencies in data from a discrete set of values sampled at a fixed rate [23, 24]. If the speech signal is represented by y (t), then the DFT of the time series or samples y0, y1,y2, …..yN-1 is defined as given by (1):

SECTORIZATION OF THE COMPLEX TRANSFORM PLANES

The speech signal has amplitude range from -1 to +1. It is first converted into positive values by adding +1 to all the sample values. Thus the amplitude range of the speech signal is now from 0 to 2. For sectorization two methods are used, which are described below: A. DFT Sectorization The algorithm for DFT sectorization is given below: 1. The DFT of the speech signal is computed. Since the DFT is symmetrical, only half of the number of points in the DFT is considered while drawing the complex DFT plane (i.e. Yreal vs. Yimag). 2. Also the first point in DFT is a real number, so it is considered separately while taking feature vectors. So the complex plane is only from (2, N/2), where N is the number of points in DFT. Fig. 2 shows the original speech signal and its complex DFT plane for one of the samples in the database. For dividing the complex plane into sectors, the magnitude of the DFT is considered as the radius of the circular sector as in (3): Radius (R) = abs (sqrt ((Yreal)2+(Yimag)2)) 4. (3)

Yk =

-2jπkn/N ne

(1)

Where yn=ys (nΔt); k= 0, 1, 2…, N-1. Δt is the sampling interval. B. Walsh Hadamard Transform The Walsh transform or Walsh–Hadamard transform is a non-sinusoidal, orthogonal transformation technique that decomposes a signal into a set of basis functions. These basis functions are Walsh functions, which are rectangular or square waves with values of +1 or –1. The Walsh–Hadamard transform returns sequency values. Sequency is a more generalized notion of frequency and is defined as one half of the average number of zero-crossings per unit time interval. Each Walsh function has a unique sequency value. You can use the returned sequency values to estimate the signal frequencies in the original signal. The Walsh–Hadamard transform is used in a number of applications, such as image processing, speech processing, filtering, and power spectrum analysis. It is very useful for reducing bandwidth storage requirements and spread-spectrum analysis [25]. Like the FFT, the Walsh–Hadamard transform has a fast version, the fast Walsh–Hadamard transform (fwht). Compared to the FFT, the FWHT requires less storage space and is faster to calculate because it uses only real additions and subtractions, while the FFT requires complex values. The FWHT is able to represent signals with sharp discontinuities more accurately using fewer coefficients than the FFT. FWHTh is a divide and conquer algorithm that recursively breaks down a WHT of size N into two smaller WHTs of size N / 2. This implementation follows the recursive definition of the Hadamard matrix HN given by (2):

3.

Table I shows the range of the radius taken for dividing the DFT plane into circular sectors.

1

0.5

Amplitude

0

-0.5

-1 0

0.5

1

1.5

No. of samples

2

2.5

3

3.5

4

4.5 x 10

5

4

400

300

200

100

Ximag

0

-100

(2) The normalization factors for each stage may be grouped together or even omitted. The Sequency ordered, also known as Walsh ordered, fast Walsh–Hadamard transform, FWHTw, is obtained by computing the FWHTh as above, and then rearranging the outputs. The rest of the paper is organized as follows: Section II explains the sectorization process, Section III explains the feature extraction using the density of the samples in each of the sectors, Section IV deals with Feature Matching, and results are explained in Section V and the conclusion in section VI.

Identify applicable sponsor/s here. (sponsors)

-200

-300

-400 -400

-300

-200

-100

Xreal

0

100

200

300

400

Figure 2. Speech signal and its complex DFT plane

5.

The maximum range of the radius for forming the sectors was found by experimenting on the different samples in

140

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, 2011

the database. Various combinations of the range were tried and the values given in Table I was found to be satisfactory. Fig. 3 shows the seven sectors formed for the complex plane shown in Fig. 2. Different colours have been used to show the different sectors. 6. The seven circular sectors were further divided into four quadrants each as given by Table II. Thus we get 28 sectors for each of the samples. Fig. 4 shows the 28 sectors formed for the sample shown in Fig. 2.

TABLE I. Sr. No. 1 2 3 4 5 6 7 RADIUS RANGE OF THE CIRCULAR SECTORS Sector Sector1 Sector2 Sector3 Sector4 Sector5 Sector6 Sector7 Weighing factor 2/256 6/256 12/256 24/256 48/256 96/256 192/256

300

200

100

0

-100

Radius range 0≤R≤4 4≤R≤8 8≤R≤16 16≤R32 32≤R≤64 64≤R≤128 128≤R≤256

-200

-300 -300

-200

-100

0

100

200

300

Figure 4. Sectorization of DFT plane into 28 sectors for the speech sample shown in Fig. 2

1.

250 225 200 175 150 125 100 75 50 25 0 -25 -50 -75 -100 -125 -150 -175 -200 -225 -250 -250 -225 -200 -175 -150 -125 -100 -75 -50 -25 0 25 50 75 100 125 150 175 200 225 250

The WHT of the speech signal is taken using FWHT (Fast Walsh Hadamard Transform). The WHT can be represented as (C0, S0, C1, S1, C2, S2, …….., CN-1, SN-1), C represents Cal term and S represents Sal term. The Walsh transform matrix is real but by multiplying all Sal Components by j it can be made complex. The first term i.e. C0 represents dc value. So the complex plane is considered by combining S0 with C1, S1 with C2 and so on. In this case SN-1 will be left out. Thus C0 and SN-1 are considered separately. The complex Walsh transform is then divided into circular sectors as shown by (4). Again the radial sectors are formed using the radius as shown in Table I. Radius (R) = abs (sqrt ((Ycal)2+(Ysal)2)) (4)

2.

3.

4.

Figure 3. Circular Sectors of the complex DFT plane of the speech sample shown in Fig. 2

5.

TABLE II.

DIVISION INTO FOUR QUADRANTS

Sr. No. 1 2 3 4

value Xreal≥0 & Ximag≥0 Xreal≤0 & Ximag≥0 Xreal≤0 & Ximag≤0 Xreal≥0 & Ximag≤0

Quadrant 1 (00 – 900 ) 2 (900 – 1800) 3 (1800 – 2700) 4 (2700 – 3600)

The seven circular sectors were further divided into four quadrants as explained in (A) by using Table II. Thus we get 28 sectors for each of the samples. III.

FEATURE VECTOR EXTRACTION

For feature vector generation, the count of the number of points in each of the sectors is first taken. Then feature vector is calculated for each of the sectors according to (5). Feature vector = ((count/n1)*weighing factor)*10000 (5)

B. WHT Sectorization The algorithm for Walsh Sectorization is given below:

For DFT, the first value i.e. dc component is accounted as in (6). For WHT, C0 is accounted as given by (6) and SN-1 is considered as given by (7). Overall there are eight components in the feature vector for DFT (one per sector and first term).

141

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, 2011

Similarly, there are nine components in the feature vector for WHT (one per sector, first term and last term), when the seven circular sectors are considered. When 28 sectors are considered there are 29 components in the feature vector (one per sector and first term) for DFT and 30 components in the feature vector (one per sector, first term and last term) for WHT. First term = sqrt (abs (first value of DFT/WHT)) Last term = sqrt (abs (Last value of FWHT)) IV.

RESULTS

decreases. When the complex plane is further divided into 56 sectors, there is a improvement in accuracy for less number of samples, but as the number of samples is increased performance is similar as that with 28 sectors. Fig. 6 shows the

(6) (7)

A. Database description The speech samples used in this work are recorded using Sound Forge 4.5. The sampling frequency is 8000 Hz (8 bit, mono PCM samples). Table II shows the database description. The samples are collected from different speakers. Samples are taken from each speaker in two sessions so that training model and testing data can be created. Twelve samples per speaker are taken. The samples recorded in one session are kept in database and the samples recorded in second session are used for testing.

TABLE III. Parameter Language No. of Speakers Speech type Recording conditions Sampling frequency Resolution

DATABASE DESCRIPTION

Sample characteristics English 30 Read speech Normal. (A silent room) 8000 Hz 8 bps

Figure 5. Accuracy for DFT Sectorization

B. Experimentation This algorithm was tested for text dependent speaker identification. Feature vectors for both the methods described in section II were calculated as shown in section III. For testing, the test sample is similarly processed and feature vector is calculated. For recognition, the Euclidean distance between the features of the test sample and the features of all the samples stored in the database is computed. The sample in the database for which the Euclidean distance is minimum, is declared as the speaker recognized. C. Accuracy of Identification The accuracy of the identification system is calculated as given by equation 5. (5) Fig. 5 shows the results obtained for DFT sectorization. As seen from the results, when the complex DFT plane is divided into seven sectors, the maximum accuracy is around 80% and decreases as the number of samples in the database is increased (64% for 30 samples). It can be seen that accuracy increases when the number of sectors into which the complex DFT plane is divided, is increased from 7 to 28. With 28 sectors, the maximum accuracy is 80% up to 20 samples after which it

Figure 6. Accuracy for WHT Sectorization

results obtained for WHT sectorization. Here also we see that accuracy improves as the number of sectors is increased from 7 to 28. But further division into 56 sectors does not give any advantage. Overall the results obtained for DFT are better than those obtained for WHT.

142

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No. 3, 2011

V.

CONCLUSION

Speaker Identification using the concept of Sectorization has been proposed in this paper. The complex DFT and WHT plane has been divided into circular sectors and feature vectors have been calculated using weighted density. Accuracy increases when the 7 circular sectors are divided into 28 sectors for both the transform techniques. But there is no significant improvement when the complex plane is further divided. The results also show that the performance of DFT is better than WHT. REFERENCES

[1] Lawrence Rabiner, Biing-Hwang Juang and B.Yegnanarayana, “Fundamental of Speech Recognition”, Prentice-Hall, Englewood Cliffs, 2009. S Furui, “50 years of progress in speech and speaker recognition research”, ECTI Transactions on Computer andInformation Technology, Vol. 1, No.2, November 2005. D. A. Reynolds, “An overview of automatic speaker recognition technology,” Proc. IEEE Int. Conf. Acoust., Speech,S on Speech and Audio Processing, Vol. 7, No. 1, January 1999. IEEE, New York, NY, U.S.A S. Furui. Recent advances in speaker recognition. AVBPA97, pp 237-251, 1997 J. P. Campbell, ``Speaker recognition: A tutorial,'' Proceedings of the IEEE, vol. 85, pp. 1437--1462, September 1997. D. A. Reynolds, “Experimental evaluation of features for robust speaker identification,” IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 639–643, Oct. 1994. Tomi Kinnunen, Evgeny Karpov, and Pasi Fr¨anti, “Realtime Speaker Identification”, ICSLP2004. F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. MagrinChagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D.PetrovskaDelacrétaz, and D. A. Reynolds, “A tutorial on text-independent speaker verification,” EURASIP J. Appl. Signal Process., vol. 2004, no. 1, pp. 430–451, 2004. Marco Grimaldi and Fred Cummins, “Speaker Identification using Instantaneous Frequencies”, IEEE Transactions on Audio, Speech, and Language Processing, vol., 16, no. 6, August 2008. Zhong-Xuan, Yuan & Bo-Ling, Xu & Chong-Zhi, Yu. (1999). “Binary Quantization of Feature Vectors for Robust Text-Independent Speaker Identification” in IEEE Transactions. Dr. H B Kekre, Vaishali Kulkarni,”Speaker Identification using Power Distribution in Frequency Spectrum”, Technopath, Journal of Science, Engineering & Technology Management, Vol. 02, No.1, January 2010. Dr. H B Kekre, Vaishali Kulkarni, “Speaker Identification by using Power Distribution in Frequency Spectrum”, ThinkQuest - 2010 International Conference on Contours of Computing Technology”, BGIT, Mumbai,13th -14th March 2010. H B Kekre, Vaishali Kulkarni, “Speaker Identification by using Vector Quantization”, International Journal of Engineering Science and Technology, May 2010. H B Kekre, Vaishali Kulkarni, “Performance Comparison of Speaker Recognition using Vector Quantization by LBG and KFCG ” , International Journal of Computer Applications, vol. 3, July 2010. H B Kekre, Vaishali Kulkarni, “ Performance Comparison of Automatic Speaker Recognition using Vector Quantization by LBG KFCG and KMCG”, International Journal of Computer Science and Security, Vol: 4 Issue: 5, 2010. H B Kekre, Vaishali Kulkarni, “Comparative Analysis of Automatic Speaker Recognition using Kekre’s Fast Codebook Generation Algorithm in Time Domain and Transform Domain ” , International Journal of Computer Applications, Volume 7 No.1. September 2010.

[2]

[3]

[17] H B Kekre, Dhirendra Mishra, “Performance Comparison of Density Distribution and Sector mean of sal and cal functions in Walsh Transform Sectors as Feature Vectors for Image Retrieval ” , International Journal of Image Processing ,Volume :4, Issue:3, 2010. [18] H B Kekre, Dhirendra Mishra, “CBIR using Upper Six FFT Sectors of Color Images for Feature Vector Generationl”, International Journal of Engineering and Technology,Volume :2(2) ”, 2010. [19] H B Kekre, Dhirendra Mishra, “Performance Comparison of Four, Eight & Twelve Walsh Transform Sectors Feature Vectors for Image Retrieval from Image Databases”, International Journal of Engineering Science and Technology”, Volume :2(5) , 2010. [20] H B Kekre, Dhirendra Mishra, “ Four Walsh Transform Sectors Feature Vectors for Image Retrieval from Image Databases ” , International Journal of Computer Science and Information Technologies”, Volume :1(2) , 2010. [21] H B Kekre, Dhirendra Mishra, “Digital Image Search & Retrieval using FFT Sectors of Color Images”, International Journal of Computer Science and Engineering”, Volume :2 , No.2, 2010. [22] H B Kekre, Vaishali Kulkarni, “Automatic Speaker Recognition using circular DFT Sector”, Interanational Conference and Workshop on Emerging Trends in Technology (ICWET 2011), 25-26 February, 2011. [23] Bergland, G. D. "A Guided Tour of the Fast Fourier Transform." IEEE Spectrum 6, 41-52, July 1969 [24] Walker, J. S. Fast Fourier Transform, 2nd ed. Boca Raton, FL: CRC Press, 1996. [25] Terry Ritter, Walsh-Hadamard Transforms: A Literature Survey, Aug. 1996.

[4] [5] [6]

AUTHORS PROFILE Dr. H. B. Kekre has received B.E. (Hons.) in Telecomm. Engg. from Jabalpur University in 1958, M.Tech (Industrial Electronics) from IIT Bombay in 1960, M.S.Engg. (Electrical Engg.) from University of Ottawa in 1965 and Ph.D. (System Identification) from IIT Bombay in 1970. He has worked Over 35 years as Faculty of Electrical Engineering and then HOD Computer Science and Engg. at IIT Bombay. For last 13 years worked as a Professor in Department of Computer Engg. at Thadomal Shahani Engineering College, Mumbai. He is currently Senior Professor working with Mukesh Patel School of Technology Management and Engineering, SVKM’s NMIMS University, Vile Parle(w), Mumbai, INDIA. He ha guided 17 Ph.D.s, 150 M.E./M.Tech Projects and several B.E./B.Tech Projects. His areas of interest are Digital Signal processing, Image Processing and Computer Networks. He has more than 300 papers in National / International Conferences / Journals to his credit. Recently twelve students working under his guidance have received best paper awards. Recently two research scholars have received Ph. D. degree from NMIMS University Currently he is guiding ten Ph.D. students. He is member of ISTE and IETE.

[7] [8]

[9]

[10]

[11]

[12]

**Vaishali Kulkarni has received B.E in Electronics
**

Engg. from Mumbai University in 1997, M.E (Electronics and Telecom) from Mumbai University in 2006. Presently she is pursuing Ph. D from NMIMS University. She has a teaching experience of more than 8 years. She is Associate Professor in telecom Department in MPSTME, NMIMS University. Her areas of interest include Speech processing: Speech and Speaker Recognition. She has 10 papers in National / International Conferences / Journals to her credit.

[13]

[14]

[15]

[16]

143

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

by ijcsis

In this paper we aim to provide a unique approach to text dependent speaker identification using transform techniques such as DFT (Discrete Fourier Transform) and WHT (Walsh Hadamard Transform). In...

In this paper we aim to provide a unique approach to text dependent speaker identification using transform techniques such as DFT (Discrete Fourier Transform) and WHT (Walsh Hadamard Transform). In the first method, the feature vectors are extracted by dividing the complex DFT spectrum into circular sectors and then taking the weighted density count of the number of points in each of these sectors. In the second method, the feature vectors are extracted by dividing the WHT spectrum into circular sectors and then again taking the weighted density count of the number of points in each of these sectors. Further, comparison of the two transforms shows that the accuracy obtained for DFT is more (80%) than that obtained for WHT (66%).

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd