You are on page 1of 9

Speech Recognition using FPGA

NAMES:
MUBASHIR AHMED KHATRI
HASSAN SHAHBAZ
HASSAM NOOR

16100215
16100035
16100236

DIGITAL SYSTEM DESIGN LAB (EE-421L)


PROJECT PROPOSAL

SPEECH RECOGNITION USING FPGA

Speech Recognition using FPGA

ABSTRACT
An increasingly popular way to interact with machines is to simply talk to them.
Just give a simple voice signal to machine to make it work automatically, instead of
manually doing the task. However, there is often a trade-off between ease of use and
system complexity. Thus, the main objective of this project is to design and implement a
speech recognition system using a Field Programmable Gate Array (FPGA). It must be
capable of accurately identifying a single sound while remaining simple and fast. For
this purpose, an algorithm is prototyped and tested using MATLAB. Using
MATLAB, we would compare the voice input from the microphone against a stored
.mp3 file, which is kept as a reference for our input voice signal. The signal processing
will be done using Fast Fourier Transform (FFT) module, which will lead to a partial, yet
functional hardware implementation.

Speech Recognition using FPGA

SCOPE
Voice recognition system is a system where an individual speaker reads text or
isolated vocabulary into the system. The system analyzes the person's specific voice
and uses it to fine-tune the recognition of that person's command.
Speech recognition applications include voice user interfaces to perform specific
tasks such as voice dialing, for example, Call for calling anyone, Search to find song
from your playlist, simple data entry like entering a credit card number, speech-to-text
processing, in-car systems, health care, military, and to help with hearing disabilities.
From the technology perspective, speech recognition has a long history with
several waves of major innovations. Most recently, the field has benefited from
advances in deep learning and big data. The advances are evidenced not only by the
surge of academic papers published in the field, but more importantly by the world-wide
industry (e.g Google, Apple) adoption of a variety of deep learning methods in designing
and deploying speech recognition systems.

Speech Recognition using FPGA

FLOW D IAGRAM
Offline Mode Done before starting FPGA
l
Go

Computer
Sound Processing

Sound card of
Computer giving
clean input

User Input

Exact sound of word


Go extracted from
recording

Computer
Microphone

Online Mode FPGA Running


Go

FPGA

Serial Bus

ADC
User Input

Analog
Microphone

MEMORY
Array of
Input Sound
Pattern

Array of reference sound


Go Stored in FPGA

CORRELATION
FUNCTION

Sound Detected
or Not

Array of
Reference
Sound
Pattern

Speech Recognition using FPGA

WORK PLAN
The process of implementation consists of the following steps:

STEP 1
Speech recognition is performed by identifying a sound based on its frequency
content. In order to achieve this, the frequency content of a sound should be extracted
from the raw input voice signal. Then, the frequency content of a sound input can be
compared to the reference mp3 file stored on the computer by treating them as vectors
and computing the distance between them. If a sound is close enough to the reference,
then it is considered to be a match. All this implementation is done using MATLAB in
order to better illustrate it, and experiment with the settings.

STEP 2
Sound waves are analog in nature. Thus, they must be converted to binary data
in order to be read and processed by a digital system. This task is performed by an
analog-to-digital converter (ADC). It maps the waves' amplitude to a binary number at a
given frequency (the sampling frequency). The precision and the sampling frequency
required for the scope of this project are respectively 8 bits and 5 kHz. The former has
been chosen for convenience while the latter is the standard used by telephone lines.

STEP 3
The system must know when a spoken word is input. Thus, a detection algorithm
needs to be devised. This is done by continually computing the difference of the
absolute average of two adjacent sound windows, and comparing it to a predefined
threshold.

STEP 4
Once the word is detected, it is mapped to the frequency domain by computing
its Discrete Fourier Transform (DFT) using the Fast Fourier Transform (FFT) algorithm.
For the sake of simplicity, only the real part of the DFT is kept. The DFT signal is sent

Speech Recognition using FPGA


through serial bus to the input of FPGA where the input voice is compared to the stored
information of the reference voice.

STEP 5
The comparison between an input word sound and the reference sound is done
by taking the euclidean distance between them. To do this, they are considered as
vectors, and the average of their respective euclidean distance is computed. This is
done in the correlation function, and if, the voice is recognized accurately the LCD wil
display whether the voice has been detected or not.

Speech Recognition using FPGA

APPLICATIONS
IN-CAR SYSTEM
Simple voice commands may be used to initiate phone calls, select radio stations or
play music from a compatible smartphone, MP3 player or music-loaded flash drive.
Voice recognition capabilities vary between car make and model. Some of the most
recent car models offer natural-language speech recognition in place of a fixed set of
commands, allowing the driver to use full sentences and common phrases. With such
systems there is, therefore, no need for the user to memorize a set of fixed command
words.

HIGH-PERFORMANCE FIGHTER AIRCRAFT


Substantial efforts have been devoted in the last decade to the test and evaluation of
speech recognition in fighter aircraft. Of particular note is the U.S. program in speech
recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft. In
these programs, speech recognizers have been operated successfully in fighter aircraft,
with applications including: setting radio frequencies, commanding an autopilot system,
setting steer-point coordinates and weapons release parameters, and controlling flight
display.

USAGE IN EDUCATION AND DAILY LIFE


Students who are physically disabled or suffer from repetitive strain injuries to the upper
extremities can be relieved from having to worry about handwriting, typing, or working
with scribe on school assignments by using speech-to-text programs. They can also
utilize speech recognition technology to freely enjoy searching the Internet or using a
computer at home without having to physically operate a mouse and keyboard.
Speech recognition can allow students with learning disabilities to become better
writers. By saying the words aloud, they can increase the fluidity of their writing, and be
alleviated of concerns regarding spelling, punctuation, and other mechanics of writing.

PEOPLE WITH DISABILITIES

Speech Recognition using FPGA


People with disabilities can benefit from speech recognition programs. For individuals
that are deaf, speech recognition software is used to automatically generate a closedcaptioning of conversations such as discussions in conference rooms, classroom
lectures, or religious services.

Speech Recognition using FPGA

RISKS
Since this device is a low powered system, we need to have a battery attached to it for
continuous running time, which have a hard to predict the battery life. This means that
there will soon come a time when the system will be off and no voice recognition can be
done for input. However, this risk can be reduced by using good quality batteries. Not
only this, a time average of the battery life can be calculated using trial batteries and we
can predict, later on, when to replace the batteries at the optimum are at the
background time.
There could be a problem related to noise cancelations when a user gives a voice
command to the system. These noises are usually the environmental noises which can
be heard over at the input when the user gives his command. This problem can be
resolved by using effective and precise coding of the command signal whose
frequencies are generated in Matlab through Fourier Transform.

CONCLUSIONS
We have proposed a low-cost device that can be connected to any digital device to take
voice input and perform the specific task accordingly. Speech Recognition Device is
easy to use, robust, low-cost and simpler version of a voice controlled robot. These
devices are the latest edition of any smart phone in the market. It can also recognize
any language which can be recoded to display native language of an individual,
although this device would need a lot of memory to store reference voices.