Voice Recognition on FPGA

P.N.S. Anjan Kumar and Abhanshu Gupta


Voice Recognition on FPGA
Principle: Consonants can be categorized into unvoiced and voice consonants. ‘Unvoiced consonants’ have high frequencies, Low frequency while ‘Voiced consonants’ have low frequency, high volume.

Word identification algorithm:The word ‘yes’ contains 3 phones in which it contains phone ‘s’ which is an unvoiced consonant, therefore it will have a high frequency component at the end of the word, whereas the word ‘no’ contains no unvoiced consonants. Noise contains low frequency and low volume. By setting a threshold for amplitude of signal we can discard the noise components. The highlight points of the algorithm are:1. The sound wave samples are divided into evenly spaced blocks of 600 samples each. Total number of zero crossings for each block is found and to increase the exactness of finding the maximum frequency present each block is shifted by 200 samples such that overlapping blocks are generated. The zero crossings of each block are compared to get the maximum zero crossing number. 2. Spectrum analysis of the sound wave is done using FFT. From the FFT analysis we have calculated the frequency of the phone ‘s’ to be between 3400 to 5600Hz. Corresponding to these frequency range a threshold of zero crossings is set to distinguish between words having phone’s’. 3. If the maximum value of zero crossing crosses the threshold used then the word identified is ‘YES’ otherwise it is ‘NO’.

Block Containing ‘S’ phone

Implementation:We implemented the project in MATLAB using the above algorithm. This function takes a wave file as input. The file contains a recorded voice of a person saying either yes or no. The function determines whether file contains yes or no and plots the graph of number of zero crossings versus number of blocks.

While implementing the above algorithm on FPGA we have divided VHDL code into three entity blocks as:1. ADC interface (Analog signal to digital signal). 2. Zero crossings code. 3. Decision Block. In order to implement the Zero crossing blocks we have followed two approaches:1. Voice recognition code is implemented in MATLAB SIMULINK and we generated the VHDL code from using system generator. This distinction is used in our project to identify the word ‘yes’ or ‘no’. 2. We have written the VHDL code with simplifications and implemented the code using counters to detect a signal with frequency content greater than a set threshold frequency. The threshold frequency is set using appropriate Counter.

Future Scope:To identify different words we can use HMM (Hidden Markov Model). A HMM is constructed for each word in the vocabulary, and then the string of phones is compared against each HMM to determine which model is likely match. Each expected phone is represented by a state in the HMM.

Sign up to vote on this title
UsefulNot useful