You are on page 1of 5

Real Time Control of DC Motor Drive using Speech Recognition

Punit Kumar Sharma, Dr. B.R. Lakshmikantha and K. Shanmukha Sundar

Abstract- This paper introduces a new approach to control and drive the DC motor, using speech recognition. The speech signal can be provided through microphone that is connected to computer. A DC motor connected through microcontroller can be driven in forward or reverse direction at different speeds, as well as it can be stopped by giving speech command. In this paper Mel Frequency Cepstral Co-efficient (MFCC) is used to recognize the user’s speech and vector quantization is used to increase speech recognition accuracy. The MFCC and VQ algorithm, for speech recognition have been implemented in MATLAB 7.5 version on Windows Vista platform, and the supporting hardware setup is being implemented. Keywords- Speech Recognition, MFCC, Vector Quantization.

II. PRINCIPLES OF SPEECH RECOGNITION Speaker recognition methods can be divided into textindependent and text-dependent methods. In a textindependent system, speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying. [4] In a text-dependent system, the recognition of the speaker’s identity is based on user’s speaking one or more specific phrases, like passwords, card numbers, PIN codes, etc. Every technology of speaker recognition, identification and verification, whether textindependent and text-dependent, each has its own advantages and disadvantages and may require different treatments and techniques. The choice of which technology to use is application-specific. At the highest level, all speaker recognition systems contain two main modules feature extraction and feature matching. [5, 6] III. METHODOLOGY The purpose of this module is to convert the speech waveform to some type of parametric representation at a considerably lower information rate. The speech signal is a slowly time varying signal. When examined over a sufficiently short period of time (between 5 and 100ms), its characteristics are fairly stationary. However, over long periods of time (on the order of 0.2s or more) the signal characteristics change to reflect the different speech sounds being spoken. The speech signal which is slowly time varying called quasi-stationary. A number of methods are available for parametrically representing the speech signal for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel-Frequency Cepstrum Coefficients (MFCC), and others [3]. MFCC is perhaps the best known and most popular. MFCC’s are based on the known variation of the human ear’s critical bandwidths with frequency. The MFCC technique makes use of two types of filter, namely, linearly spaced filters and logarithmically spaced filters. A. MFCC Processor The block diagram represents the structure of a MFCC processor in Fig. 1. The speech input is recorded at sampling frequency rate of 12500 Hz. This sampling frequency is chosen to minimize the effects of aliasing in the analog -todigital conversion process. Figure 1 shows the block diagram of an MFCC processor. B. Mel Frequency Wrapping The speech signal consists of tones with different frequencies. For each tone with an actual Frequency f,

I. INTRODUCTION Speech is one of the natural forms of communication. Recent development has made it possible to use this in the security system. In speech recognition, the task is to use a speech sample to select the identity of the person that produced the speech from among a population of speakers. This paper makes it possible to use the speakers’ voice to verify their identity and control access to DC motor. This approach can be used such as voice dialling, banking by telephone, to drive electrical vehicles, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers [1,3]. The MFCC algorithm for speech recognition is more accurate then Linear Prediction Coding (LPC) and Hidden Markove Model (HMM). The external DC motor is connected through interfacing between computer and hardware circuit. The hardware circuit consist of microcontroller (8051), IC MAX 232, driver IC (l293D) mainly.1

Punit Kumar Sharma, Department of Electrical & Electronics Engineering, Dayananda Sagar College of Engineering, Bangalore, India (e-mail : Dr. B.R. Lakshmikantha, , Department of Electrical & Electronics Engineering, Dayananda Sagar College of Engineering, Bangalore, India (e-mail : K. Shanmukha Sundar, , Department of Electrical & Electronics Engineering, Dayananda Sagar College of Engineering, Bangalore, India (e-mail :

978-1-4244-7882-8/11/$26.00 ©2011 IEEE

Dynamic Time Warping (DTW).. [6.. an initial codevector is set as the average of the entire training sequence.. every pair of numbers falling in a particular region is approximated by a star associated with that region. LBG Design Algorithm The LBG VQ design algorithm is an iterative algorithm. Block diagram of MFCC processor C. Buzo & R.. The mel-frequency scale is linear frequency spacing below 1000Hz and a logarithmic spacing above 1000Hz. is defined as 1000 mels.. C0 is excluded from the DCT since it represents the mean value of the input signal which carries little speaker specific information.2.(2) The number of mel cepstrum coefficients. the pitch of a 1 kHz tone. Flowchart of VQ-LBG algorithm The algorithm requires an initial codebook... it has to be converted back into time. The algorithm is summarized in the . The first component.. Gray..[1. is typically chosen as 16. FEATURE MATCHING Feature matching techniques used in speaker recognition include. A. Linde... It is a fixedto-fixed length algorithm. Fig 1. It is proposed by Y. This codevector is then split into two. 2. (1) Fig 2. a subjective pitch is measured on the ‘Mel’ scale. Fig. The result is called the mel frequency cepstrum coefficients (MFCCs). the stars are called codevectors and the regions defined by the borders are called encoding regions. Vector Quantization Vector quantization (VQ) is a lossy data compression method based on principle of blockcoding [9]. in Hz.measured in Hz.. In Fig. Hidden Markov Modeling (HMM)..dimensional VQ.8] K Cn= (logSk) K=1 Where n= 1. VQ may be thought as an aproximator. A.. Here.. The MFCCs may be calculated using this equation... This alternatively solves optimality criteria [10]. The iterative algorithm is run with these two vectors as the initial codebook. 40dB above the perceptual hearing threshold. This may be converted to the time domain using the Discrete Cosine Transform (DCT)..K ) ] . In this method. shows an example of a 2. 2.3.3] IV. K. The VQ approach has been used here for its ease of implementation and high accuracy.. [8] mel(f)= 2595*log10(1+f/700) . The following formula can used to compute the mels for given frequency f. and Vector Quantization (VQ). The set of all codevectors is called the codebook. Cepstrum The output of the equation (1) is log mel spectrum. An example of a 2-dimensional VQ B. As a reference point. The initial codebook is obtained by the splitting method. Fig 3.. The final two codevectors are split into four and the process is repeated until the desired number of codevectors is obtained.

microcontroller (8051) is used. the speech signal is taken by microphone that is connected to computer. these are shown below. In Fig. 3. Complete experimental setup VI. in MATLAB 7. Fig. 6 shows the basic speech recognition block diagram. For example the speech databases consist of 10 speeches. 8. For microcontroller coding Embedded C programming is used. Software coding is to calculate the MFCC and VQ (LBG algorithm) MATLAB 7. 4. . Fig. For drive the DC motor the driver IC L293D is used. shows the application of filter bank. shows the graph for hamming window. While calculating of MFCC for database at the time of speech recognition.5 version is used here in this paper.flowchart of Fig. the VQ is shown for two speakers. 10 shows the graph of speaker voice database (example).5 version on window Vista platform and supporting hardware also has been implemented. here pc denotes the personal computer. 13. MICROPHONE PC Fig 4. to recognize the input speech taken from micro phone. Fig 5. [5] In the training phase. showing below the complete experimental setup for DC motor Drive through speech recognition. 7. The circle refers the speaker 1 and triangle refers for speakers 2. EXPERIMENTAL SETUP In the experimental setup of DC motor drive through speech recognition. conceptual diagram that explains VQ process SERIAL PORT INTERFACE Micro Controller DC Motor Driver Unit Fig 7. The Fig. shows the recognized speaker ID 2. here the Fig. Basic speech recognition block diagram The Fig. 11. External DC motor can be driven in forward or reverse direction as well as it can be stopped also by giving speech commands. Pictorial view of codebook with 15 Centroids respectively.12. 5. The interfacing between computer and microcontroller is done by RS-232. Fig. Euclidian distance is calculated for nearest speech matching. shows the use of different number of centroids for the same data field. a speaker-specific VQ codebook is generated for each known speaker. shows the graph for hamming window multiplied by input signal. 9. The Fig. Audio signal MFCC VECTOR QUANTIZATION EUCLIDIAN DISTANCE RECOGNIZED SPEAKER O/P Fig 6. V. Fig. After calculation of MFCC and VQ. shows the speech added to data base and Fig. The interfacing is done between hardware and software using RS-232 cable (MAX-232 IC). there are some graphs obtained. And for hardware part to make DC motor understands. RESULTS The coding has been developed using MFCC and VQ algorithm.

Graph for hamming window Fig 9. Applied filter banks . Sound added to database in MATLAB Fig 11. Hamming window multiplied by input signal Fig 10. Recognized speaker ID 2 Fig 12. Graph of speaker voice database Fig 13.Fig 8.

14-26. J. and L. Rabiner. Wei HAN. B. “Binary Quantization of Feature Vectors for Robust Text-Independent Speaker Identification” in IEEE Transactions on Speech and Audio Processing. [2] Differential MFCC and Vector Quantization used for Real-Time Speaker Recognition System. Vol. Weihai. J. 2000. Bangalore-78. Bangladesh University of Engineering and Technology. Soong. Shanmukha Sundar. (1999). April 1984.. [6] F. Rashidul Hasan. Electrical drives etc. China. ``Vector Quantization. His areas of interest are FACTS devices.http://svr www. China. AT&T Technical Journal.84-95. Rosenberg. pp. Dr. BIOGRAPHIES Punit Kumar Sharma. . E. NY. Wang Chen Miao Zhenjiang.S. electrical drives etc. Linde. IEEE Press. Golam Rabbani Md. His areas of interest are power system optimization. B. unman vehicles. Beijing 100044. vol. Minghai Yao. M. Proceedings of the 2006 IEEE. Juang. FACTS controllers. Discrete-Time Processing of Speech Signals. January 1999. Belgaum (Karnataka). 1993. (Power system) in 1981 from Visvesvaraya Technological University & Ph. Englewood India.E. Md. IEEE Transactions on in Power system Stability from Visvesvaraya Technological University.speech Frequently Asked Questions WWWsite. Hansen. Md. 310032. Hong Kong. “An algorithm for vector quantizer design”. pp. obtained his B. Shandong. IEEE. [9] R. 2006. 1.'' IEEE ASSP Magazine. obtained his B. K. 7. D.R. Mustafa Jamil. Bangladesh. 20 .(Power Systems) 1994 from Mysore and Mangalore University respectively. International Conference on Information Acquisition August. 28-30 December 2004. New York.A. College of Information Engineering. "A Vector Quantization Approach to Speaker Recognition". Yu. 66. 2008 IEEE Congress on Image and Signal Processing. (Electrical Engineering) in 2007 from University of Rajasthan & pursuing his M. Lakshmikantha. [4] Lawrence Rabiner and Biing-Hwang Juang.. to drive electric vehicles security areas (like banking. VIII. 4--29.E. Yuan & Bo-Ling. born in 1985 Rajasthan. Presently he is working as associate professor in the department of Electrical & Electronics Engineering. New York. 3rd International Conference on Electrical & Computer Engineering ICECE 2004.. Vol. A. He is working as Dean of academics and HOD of EEE Dept. Institute of Information Science. Department of Electronic Engineering. artificial Intelligence. (Electrical Engineering) in 1979 from Bangalore University. U. The developed speech algorithm can be use for navigation purpose..E. Hangzhou. M. CONCLUSION In this paper MFCC and VQ techniques are used in speech recognition to control the DC motor drive. March/April 1987.. Buzo & R. [10] Y. Gray. [5] Zhong-Xuan. [11] A Mixed Parameter Method Based on MFCC and Fractal Dimension for Speech Recognition. REFERENCES [1] An Efficient MFCC Extraction Method in Speech Recognition. The Chinese University of Hong Kong. Fundamental of Speech Recognition”.J. Electrical and Electronic Engineering. J. 1980. embedded system. The code developed in MATLAB using MFCC and VQ can be even used for control and drive the stepper motor. IX.E. N.D. Cheong-Fat CHAN.eng. Saifur Rahman. Dayananda Sagar College of Engineering Bangalore-78. and Proakis.23. His areas of interest are speech recognition. He received his B. pp. Beijing Jiaotong University. Jing Hu and Qinlong Gu. Zhejiang University of Technology. China [3] “Speaker Identification Using MEL Frequency Cepstral Coefficient”. 28. No. Gray. Chiu-Sing CHOY and Kong-Pang PUN.Tech in Power Electronics from Visvesvaraya Technological University.speech/ [8] Jr. second ed. Prentice-Hall. Dhaka. remote access of computers where speech can be use as password). Xu & Chong-Zhi. IEEE 2006.VII. [7] Comp. at Dayananda Sagar College of Engineering. servo motor etc. (Electrical Engineering) degree in the year 1990 and M.