Professional Documents
Culture Documents
Voice Controlled Robot
Voice Controlled Robot
________________________________________________________________________
Certificate
This is to certify that Mr. Pratik Chopra of Electronics Department, bearing the
University seat number 8139 has completed the B.E. project on Voice
Controlled Robot and is accepted and examined for the partial fulfillment of the
Bachelor of Electronics Engineering Degree by the University of Mumbai.
_______________
________________
Examiner
Date of Examination
2
_____________________________________________________________________
________________________________________________________________________
ACKNOWLEGDEMENT
We take this opportunity to express our deepest gratitude towards Mr. S.S. Halbe, our
project guide, who has been the driving force behind this project and whose guidance and
co-operation has been a source of inspiration for us.
We would also like to thank Prof. Samir Mhatre for his valuable support whenever
needed.
We are very much thankful to our professors, colleagues and authors of various
publications to which we have been referring to. We express our sincere appreciation
and thanks to all those who have guided us directly or indirectly in our project. Also
much needed moral support and encouragement was provided on numerous occasions by
our whole division
Finally we thank our parents for their immense support.
3
_____________________________________________________________________
________________________________________________________________________
Contents
1.
2.
3.
4.
5.
6.
7.
Introduction--------------------------------------------------------------------5
The Task------------------------------------------------------------------------7
Speech Recognition Types/Styles-------------------------------------------9
Approaches to statistical Speech Recognition----------------------------11
Nature of Problem------------------------------------------------------------13
Solution to Problems---------------------------------------------------------16
Design Approach-------------------------------------------------------------18
a. Speech Recognition Module----------------------------------------19
b. Microcontroller and Decoder circuit-----------------------------28
c. RF module------------------------------------------------------------33
d. Driver Circuit--------------------------------------------------------35
e. Buffer-----------------------------------------------------------------35
f. Batteries--------------------------------------------------------------35
8. Training and Recognition---------------------------------------------------36
9. Applications-------------------------------------------------------------------37
10. Components Used------------------------------------------------------------38
11. Datasheet-HM2007----------------------------------------------------------39
12. Project Progress Report Summary----------------------------------------46
13. Bibliography------------------------------------------------------------------47
4
_____________________________________________________________________
________________________________________________________________________
Chapter1. INTRODUCTION
When we say voice control, the first term to be considered is Speech Recognition i.e.
making the system to understand human voice. Speech recognition is a technology where
the system understands the words (not its meaning) given through speech.
Speech is an ideal method for robotic control and communication. The speechrecognition circuit we will outline, functions independently from the robots main
intelligence [central processing unit (CPU)]. This is a good thing because it doesnt take
any of the robots main CPU processing power for word recognition. The CPU must
merely poll the speech circuits recognition lines occasionally to check if a command has
been issued to the robot. We can even improve upon this by connecting the recognition
line to one of the robots CPU interrupt lines. By doing this, a recognized word would
cause an interrupt, letting the CPU know a recognized word had been spoken. The
advantage of using an interrupt is that polling the circuits recognition line occasionally
would no longer be necessary, further reducing any CPU overhead.
Another advantage to this stand-alone speech-recognition circuit (SRC) is its
programmability. You can program and train the SRC to recognize the unique words you
want recognized. The SRC can be easily interfaced to the robots CPU.
To control and command an appliance (computer, VCR, TV security system, etc.) by
speaking to it, will make it easier, while increasing the efficiency and effectiveness of
5
_____________________________________________________________________
________________________________________________________________________
working with that device.At its most basic level speech recognition allows the user to
perform parallel tasks, (i.e. hands and eyes are busy elsewhere) while continuing to work
with the computer or appliance.
Robotics is an evolving technology. There are many approaches to building robots, and
no one can be sure which method or technology will be used 100 years from now. Like
biological systems, robotics is evolving following the Darwinian model of survival of
the fittest.
Suppose you want to control a menu driven system. What is the most striking property
that you can think of?
Well the first thought that came to our mind is that the range of inputs in a menu driven
system is limited. In fact, by using a menu all we are doing is limiting the input domain
space. Now, this is one characteristic which can be very useful in implementing the menu
in stand alone systems. For example think of the pine menu or a washing machine menu.
How many distinct commands do they require?
6
_____________________________________________________________________
________________________________________________________________________
1.
2.
3.
4.
5.
6.
7.
move forward
move back
turn right
turn left
load
release
stop ( stops doing the current job )
7
_____________________________________________________________________
________________________________________________________________________
forward
moves forward
Back
moves back
Right
turns right
Left
turns left
Load
Release
Stop
(Words are chosen in such a way that they sound least familiar)
8
_____________________________________________________________________
________________________________________________________________________
Voice enabled devices basically use the principal of speech recognition.It is the process
of electronically converting a speech waveform (as the realization of a linguistic
expression) into words (as a best-decoded sequence of linguistic units).
Converting a speech waveform into a sequence of words involves several essential steps:
1. A microphone picks up the signal of the speech to be recognized and converts it
into an electrical signal. A modern speech recognition system also requires that
the electrical signal be represented digitally by means of an analog-to-digital
(A/D) conversion process, so that it can be processed with a digital computer or a
microprocessor.
2. This speech signal is then analyzed (in the analysis block) to produce a
representation consisting of salient features of the speech. The most prevalent
feature of speech is derived from its short-time spectrum, measured successively
over short-time windows of length 2030 milliseconds overlapping at intervals of
1020 ms. Each short-time spectrum is transformed into a feature vector, and the
temporal sequence of such feature vectors thus forms a speech pattern.
3. The speech pattern is then compared to a store of phoneme patterns or models
through a dynamic programming process in order to generate a hypothesis (or a
number of hypotheses) of the phonemic unit sequence. (A phoneme is a basic unit
of speech and a phoneme model is a succinct representation of the signal that
corresponds to a phoneme, usually embedded in an utterance.) A speech signal
inherently has substantial variations along many dimensions.
Before we understand the design of the project let us first understand speech recognition
types and styles. Speech recognition is classified into two categories, speaker dependent
and speaker independent.
Speaker dependent systems are trained by the individual who will be using the system.
These systems are capable of achieving a high command count and better than 95%
accuracy for word recognition. The drawback to this approach is that the system only
responds accurately only to the individual who trained the system. This is the most
common approach employed in software for personal computers.
Speaker independent is a system trained to respond to a word regardless of who speaks.
Therefore the system must respond to a large variety of speech patterns, inflections and
enunciation's of the target word. The command word count is usually lower than the
speaker dependent however high accuracy can still be maintain within processing limits.
Industrial requirements more often need speaker independent voice systems, such as the
AT&T system used in the telephone systems.
A more general form of voice recognition is available through feature analysis and this
technique usually leads to "speaker-independent" voice recognition. Instead of trying to
9
_____________________________________________________________________
________________________________________________________________________
find an exact or near-exact match between the actual voice input and a previously stored
voice template, this method first processes the voice input using "Fourier transforms" or
"linear predictive coding (LPC)", then attempts to find characteristic similarities between
the expected inputs and the actual digitized voice input. These similarities will be present
for a wide range of speakers, and so the system need not be trained by each new user. The
types of speech differences that the speaker-independent method can deal with, but which
pattern matching would fail to handle, include accents, and varying speed of delivery,
pitch, volume, and inflection. Speaker-independent speech recognition has proven to be
very difficult, with some of the greatest hurdles being the variety of accents and
inflections used by speakers of different nationalities. Recognition accuracy for speakerindependent systems is somewhat less than for speaker-dependent systems, usually
between 90 and 95 percent. Speaker independent systems do not ask to train the system
as an advantage, but perform with lower quality. These systems find applications in
telephony communications such as dictating a number or a word where many people are
in concern. However, there is a need for a well training database in speaker independent
systems.
Recognition Style
Speech recognition systems have another constraint concerning the style of speech they
can recognize. They are three styles of speech: isolated, connected and continuous.
Isolated speech recognition systems can just handle words that are spoken separately.
This is the most common speech recognition systems available today. The user must
pause between each word or command spoken. The speech recognition circuit is set up to
identify isolated words of .96 second lengths.
Connected is a half way point between isolated word and continuous speech recognition.
Allows users to speak multiple words. The HM2007 can be set up to identify words or
phrases 1.92 seconds in length. This reduces the word recognition vocabulary number to
20.
10
_____________________________________________________________________
________________________________________________________________________
________________________________________________________________________
based maximum likelihood linear regression. Decoding of the speech (the term for what
happens when the system is presented with a new utterance and must compute the most
likely source sentence) would probably use the Viterbi algorithm to find the best path, but
there is a choice between dynamically creating combination hidden Markov models
which includes both the acoustic and language model information, or combining it
statically beforehand (the AT&T approach, for which their FSM toolkit might be useful).
Those who value their sanity might consider the AT&T approach, but be warned that it is
memory hungry.
Another approach in acoustic modeling is the use of neural networks. They are capable of
solving much more complicated recognition tasks, but do not scale as well as HMMs
when it comes to large vocabularies. Rather than being used in general-purpose speech
recognition applications they can handle low quality, noisy data and speaker
independence. Such systems can achieve greater accuracy than HMM based systems, as
long as there is training data and the vocabulary is limited. A more general approach
using neural networks is phoneme recognition. This is an active field of research, but
generally the results are better than for HMMs. There are also NN-HMM hybrid systems
that use the neural network part for phoneme recognition and the hidden Markov model
part for language modeling.
Dynamic time warping is an algorithm for measuring similarity between two sequences
which may vary in time or speed. For instance, similarities in walking patterns would be
detected, even if in one video the person was walking slowly and if in another they were
walking more quickly, or even if there were accelerations and decelerations during the
course of one observation. DTW has been applied to video, audio, and graphics -- indeed,
any data which can be turned into a linear representation can be analyzed with DTW.
A well known application has been automatic speech recognition, to cope with different
speaking speeds. In general, it is a method that allows a computer to find an optimal
match between two given sequences (e.g. time series) with certain restrictions, i.e. the
sequences are "warped" non-linearly to match each other. This sequence alignment
method is often used in the context of hidden Markov models.
12
_____________________________________________________________________
________________________________________________________________________
13
_____________________________________________________________________
________________________________________________________________________
OSCILLOGRAM (WAVEFORM)
Physically the speech signal (actually all sound) is a series of pressure changes in the
medium between the sound source and the listener. The most common representation of
the speech signal is the oscillogram, often called the waveform. In this the time axis is the
horizontal axis from left to right and the curve shows how the pressure increases and
decreases in the signal. The utterance we have used for demonstration is "phonetician",.
The signal has also been segmented, such that each phoneme in the transcription has been
aligned with its corresponding sound event.
phonetician
SPECTROGRAM
In the spectrogram the time axis is the horizontal axis, and frequency is the vertical axis.
The third dimension, amplitude, is represented by shades of darkness. Consider the
spectrogram to be a number of spectrums in a row, looked upon "from above", and where
the highs in the spectra are represented with dark spots in the spectrogram.
From the picture it is obvious how different the speech sounds are from a spectral point
of view
14
_____________________________________________________________________
________________________________________________________________________
Now, let's look at the spectrograms of the vowel /i:/ in "three" and "tea".
PROBLEMS DUE TO NOISE:A machine will have to face many problems, when trying to imitate the ability of
humans. The audio range of frequencies varies from 20 Hz to 20 kHz. Some external
noises have frequencies that may be within this audio range. These noises pose a problem
since they cannot be filtered out.
DIFFERENCES IN THE PROPERTIES OF MICROPHONES: There may be problems due to differences in the electrical properties of different
mikes and transmission channels.
DIFFERENCES IN THE PITCH:Pitch and other source features such as breathiness and amplitude can be varied
independently.
OTHER PROBLEMS:We have to make sure that robot does not go out of reach of our voice.
15
_____________________________________________________________________
________________________________________________________________________
Chapter5. SOLUTION TO PROBLEMS
After analyzing the problems we come out with the solutions which are listed below.
1. Amplitude Variation:Amplitude variation of the electrical signal output of microphone may occur mainly due
to:
a) Variation of distance between sound source and the transducer.
b) Variation of strength of sound generated by source.
To recognize a spoken word, it does not matter whether it has been spoken loudly
or less loudly. This is because characteristic features of a word spoken lies in its
frequency & not in its loudness (amplitude). Thus, at a certain stage this amplitude
information is suitably normalized.
2. Recognition of a word: If same word is spoken two times at different time instants, they sound similar to
us; question arises what is the similarity in-between them? It is important to note that it
does not matter whether one of spoken word was of different loudness than the other. The
difference lies in frequency. Hence, any large frequency variation would cause the system
not to recognize the word. In speaker independent type of system, some logic can be
implemented to take care of frequency variation. A small frequency variation i.e. features
variation within tolerable limits is considered to be acceptable.
3. Noise:Along with the sound source of the speech the other stray sounds also are picked
up by the microphone, thus degrading the information contained in the signal.
4. Microphone response: Two different microphones may not have same response. Hence if microphone is
changed, or the system is installed on a new PC due to different response the success rate
of recognition may drop.
5. In order our voice is recognized by robot at a distance we will use wireless mic. In
case robot does not recognize any word, we will make an arrangement such that robot
automatically stops after some time.
6. We will use microphone pre-amplifier circuit. It is in-built in HM2007
7. We use decoding logic and motor driving circuits so chip and motors are made
compatible, thereby solving compatibility problem.
16
_____________________________________________________________________
________________________________________________________________________
8. One of the important problem which needed to be solved was to provide sufficient
current and voltage to entire assembly when interfered together. Since the current drawn
from supply was so much that a 9V battery could not last for a longer period, we used
current buffer IC. In our application we have used 74LS245.
17
_____________________________________________________________________
________________________________________________________________________
Chapter7. DESIGN APPROACH
The most challenging part of the entire system is designing and interfacing various stages
together. Our approach was to get the analog voice signal being digitized. The frequency
and pitch of words be stored in a memory. These stored words will be used for matching
with the words spoken. When the match is found, the system outputs the address of
stored words. Hence we have to decode the address and according to the address sensed,
the car will perform the required task. Since we wanted the car to be wireless, we used
RF module. The address was decoded using microcontroller and then applied to RF
module. This together with driver circuit at receivers end made complete intelligent
systems.
It must be noted that we did not use wireless mic instead used analog RF module which
transmitted 5 different frequencies each for right, left, forward, backward, crane
movement.
SYSTEM DESIGN
a. Voice Recognition Module
b. Microcontroller and Decoder
c. RF module
d. Motor Driver Circuit
e. Buffer
18
_____________________________________________________________________
________________________________________________________________________
Block Diagram:
________________________________________________________________________
We initially used an Indian manufactured voice recognition chip AP7003. It is a
monolithic user dependence speech recognition IC designed for toy application. AP7003
consist of microphone amplifier, A/D converter, speech processor and I/O controller.
After pre-recording, AP7003 can recognize up to 12 different sentences each with 1.5 sec
length with highly I/O programmability. However it was not much accurate and reliable.
So we started looking for another alternative. We found HM 2007 as a right choice.
The chip provides the options of recognizing either forty .96 second words or
twenty 1.92 second words. This circuit allows the user to choose either the .96 second
word length (40 word vocabulary) or the 1.92 second word length (20 word vocabulary).
For memory the circuit uses an 8K X 8 static RAM.
The chip has two operational modes; manual mode and CPU mode. The CPU
mode is designed to allow the chip to work under a host computer. This is an attractive
approach to speech recognition for computers because the speech recognition chip
operates as a co-processor to the main CPU. The jobs of listening and recognition dont
occupying any of the computer's CPU time. When the HM2007 recognizes a command it
can signal an interrupt to the host CPU and then relay the command code. The HM2007
chip can be cascaded to provide a larger word recognition library.
The circuit we are building operates in the manual mode. The manual mode allows one
to build a stand alone speech recognition board that doesn't require a host computer and
may be integrated into other devices to utilize speech control.
The major components of this design are: a speech recognition chip, memory,
keypad, and LED 7-segment display. The chip is designed for speaker dependent (oneuser) applications, but can be manipulated to perform speaker independent (multipleusers) applications. The keypad and LED 7-segment display will be used to program and
test the voice recognition circuit.
20
_____________________________________________________________________
________________________________________________________________________
More about the HM2007 chip
The HM2007 is a single-chip complementary metal-oxide semiconductor (CMOS) voicerecognition large-scale integration (LSI) circuit. The chip contains an analog front end,
voice analysis,recognition, and system control functions. The chip may be used in a
stand-alone or connected CPU.
Features
The system we are building is typically trained as speaker dependent (single user).Thus
the user will be its real master.
Microphone: It takes the analog voice commands and sends it to voice recognition
chip(HM 2007) in the form of electrical signal.
The human ear has an auditory range from 10 to 15,000 Hz. Sound can be picked up
easily using a microphone and amplifier. Microphones typically have an auditory range
that surpasses that of human hearing.
Microphones are transducers which detect sound signals and produce an electrical image
of the sound, i.e., they produce a voltage or a current which is proportional to the sound
signal. The most common microphones for musical use are dynamic, ribbon, or
condenser microphones. Besides the variety of basic mechanisms, microphones can be
designed with different directional patterns and different impedances.
21
_____________________________________________________________________
________________________________________________________________________
Dynamic Microphones
Advantages:
Disadvantages:
The
uniformity
of
response to different
frequencies does not
match that of the ribbon
or
condenser
microphones
Ribbon Microphones
Advantages:
Adds "warmth" to
the tone by accenting
lows when closemiked.
Can be used to
discriminate against
distant low frequency
noise in its most
common
gradient
form.
Disadvantages:
Accenting
lows
sometimes produces
"boomy" bass.
Very susceptible to
wind noise. Not
suitable for outside
use unless very well
shielded
22
_____________________________________________________________________
________________________________________________________________________
Condenser Microphones
Advantages:
Best
overall
frequency
response makes this the
microphone of choice for
many recording applications.
Disadvantages:
Expensive
May pop and crack when
close miked
Requires a battery or external
power supply to bias the
plates.
23
_____________________________________________________________________
________________________________________________________________________
Keypad: It is used for training/programming the chip. It also allocates definite memory
locations to voice commands. The keypad is made up of 12 switches.
Figure 2
When the circuit is turned on, the HM2007 checks the static RAM. If everything checks
out the board displays "00" on the digital display and lights the red LED (READY). It is
in the "Ready" waiting for a command.
24
_____________________________________________________________________
________________________________________________________________________
7-segment Display: It is used to test the voice recognition circuit.
The 7 segment display is used as a numerical indicator on many types of test equipment.It
is an assembly of light emitting diodes which can be powered individually.
They most commonly emit red light.
Powering all the segments will display the number 8.
Powering a,b,c d and g will display the number 3.
Numbers 0 to 9 can be displayed.
The d.p represents a decimal point.
The one shown is a common anode display since all anodes are joined together and go to
the positive supply.
The cathodes are connected individually to zero volts.
Resistors must be placed in series with each diode to limit the current through each diode
to a safe value.
Common cathode displays where all the cathodes are joined are also available.
25
_____________________________________________________________________
________________________________________________________________________
Applications and Drivers
A numeral to be displayed on a seven segment display is usually encoded in BCD form,
and a logic circuit driver ON or OFF the proper segments of the display. This logic is also
called decoder. Various decoders are available to drive common anode and common
cathode displays. One of the easily available decoder is 7447 AND 7448 TTL decoders.
They are open collector TTL that are designed to pull down common anode (7447 type)
and common cathode (7448 type) through external current limiting resistors.
We used 7448 decoder chip driving a common cathode seven segment display.
8k x 8 RAM: It stores decoded voice commands by the chip at the assigned locations.
26
_____________________________________________________________________
________________________________________________________________________
Output of Voice recognition module
The 8-bit output is taken from the output of the 74LS373 data octal latch. The output is
not a standard 8-bit byte, but it is broken into two 4-bit binary coded decimal (BCD)
nibbles. BCD code is related to standard binary numbers as Table below illustrates.
As you can see, the binary and BCD numbers remain the same until reaching decimal 10.
At decimal 10, BCD jumps to the upper nibble and the lower nibble resets to zero. The
binary numbers continue to decimal 15, and then jump to the upper nibble at 16 where
the lower nibble resets. If a computer is expecting to read an 8-bit binary number and
BCD is provided, this will be the cause of errors. Further since the module outputs nos.
55, 66 and 77 as default value for errors and we want these outputs not to be used, we use
microcontroller.
27
_____________________________________________________________________
________________________________________________________________________
Microcontroller and driver circuit
Decoder: It is second most important part of the project. The output from the chip is
given to decoder (micro-controller) which acts as a DMC i.e. a Digital Motor Controller.
DMC senses the output ports of HM2007 chip and produces proper o/p as per the
commands forward, backward, left, right, load, release, stop. The proper functionality of
the system depends on the proper decoding logic.
28
_____________________________________________________________________
________________________________________________________________________
Microcontroller circuit:
29
_____________________________________________________________________
________________________________________________________________________
Table shows the output codes generated due to different commands after programming
the microcontroller.
Commands
Stop
Right
Left
Backward
Forward
Crane
P0.7
1
1
1
1
1
0
P0.6
1
1
1
1
0
1
P0.5
1
1
1
0
1
1
P0.4
1
1
0
1
1
1
P0.3
1
0
1
1
1
1
P0.2
1
1
1
1
1
1
P0.1
1
1
1
1
1
1
P0.0
1
1
1
1
1
1
Code
FF
F7
EF
DF
BF
7F
(For wireless car, this is input to RF module and then to motors through driver ckt)
Commands
Stop
Right
Left
Backward
Forward
Crane
P0.0
0
0
0
0
0
0
P0.1
0
0
0
0
0
1
P0.2
0
0
0
0
0
1
P0.3
0
0
0
0
0
1
P0.4
0
0
0
1
0
1
P0.5
0
1
0
0
1
1
P0.6
0
0
0
1
0
1
P0.7
0
0
1
0
1
0
Code
00
04
01
0A
05
FC
30
_____________________________________________________________________
________________________________________________________________________
Keil 2 Vision
31
_____________________________________________________________________
________________________________________________________________________
Aec_isp_v3 C Programmer
32
_____________________________________________________________________
________________________________________________________________________
RF module:
Let's take a closer look at the RC truck we saw in 1st chapter. We will assume that the
exact frequency used is 27.9 MHz. Here's the sequence of events that take place when
you use the RC transmitter:
Each sequence contains a short group of synchronization pulses, followed by the pulse
sequence. For our truck, the synchronization segment -- which alerts the receiver to
incoming information -- is four pulses that are 2.1 milliseconds (thousandths of a second)
long, with 700-microsecond (millionths of a second) intervals. The pulse segment, which
tells the antenna what the new information is, uses 700-microsecond pulses with 700microsecond intervals.
33
_____________________________________________________________________
________________________________________________________________________
Forward: 16 pulses
Backward: 40 pulses
Forward/Left: 28 pulses
Forward/Right: 34 pulses
U-turn: 52 pulses
Crane movement: 46 pulses
The transmitter sends bursts of radio waves that oscillate with a frequency of 27,900,000
cycles per second (27.9 MHz).
The truck is constantly monitoring the assigned frequency (27.9 MHz) for a signal. When
the receiver receives the radio bursts from the transmitter, it sends the signal to a filter
that blocks out any signals picked up by the antenna other than 27.9 MHz. The remaining
signal is converted back into an electrical pulse sequence.
The pulse sequence is sent to the IC in the truck, which decodes the sequence and starts
the appropriate motor. For our example, the pulse sequence is 16 pulses (forward), which
means that the IC sends positive current to the motor running the wheels. If the next pulse
sequence were 40 pulses (reverse), the IC would invert the current to the same motor to
make it spin in the opposite direction.
The motor's shaft actually has a gear on the end of it, instead of connecting directly to the
axle. This decreases the motor's speed but increases the torque, giving the truck adequate
power through the use of a small electric motor!
The truck moves forward.
34
_____________________________________________________________________
________________________________________________________________________
Buffer: We used IC 74LS245 as buffer ic.It solved the current supply problem. It is a 3state octal bus transceiver. They are designed for asynchronous two-way communication
between data buses.The device allows the A bus to the B bus or vice-versa depending
upon the logic level at the direction control (DIR) input. The enable input can be used
to disable the device so that the buses are effectively isolated.
Batteries
Batteries are by far the most commonly used electric power supply for robotics. Batteries
are so commonplace that its easy to take them for granted. An understanding of batteries
will help you choose batteries that will optimize your robots design.
Primary batteries
Primary batteries are one-time-use batteries. The batteries we will look at in this class
deliver 1.5 V per cell. They are designed to deliver their rated electrical capacity and then
be discarded. When building robotic systems, discarding depleted primary batteries can
become expensive. However, one advantage to using primary batteries is that they
typically have a greater electrical capacity than rechargeables. If one is engaged in a
function (i.e., a robotic war) that requires the highest power density available for one-shot
use, primary batteries may be the way to go.
Secondary batteries
Secondary batteries are rechargeable. The most common rechargeable batteries are NiCds
and lead-acid. Secondary batteries, while initially more expensive, are cheaper in the long
run. Typically secondary batteries can be recharged 200 to 1000 times.
35
_____________________________________________________________________
________________________________________________________________________
Chapter8. TRAINING AND RECOGNITION
To record or train a command, the chip stores the analog signal pattern and amplitude and
saves it in the 8kx8 SRAM. In recognition mode, the chip compares the user- inputted
analog signal from the microphone with those stored in the SRAM and if it recognizes a
command, an output of the command identifier will be sent to the microprocessor through
the D0 to D7 ports of the chip. For training, testing (if recognized properly) and clearing
the memory, keypad and 7-segment display is used.
To Train:
To train the circuit begin by pressing the word number you want to train on the keypad.
Use any numbers between 1 and 40. For example press the number "1" to train word
number 1. When you press the number(s) on the keypad the red led will turn off. The
number is displayed on the digital display. Next press the "#" key for train. When the "#"
key is pressed it signals the chip to listen for a training word and the red led turns back
on. Now speak the word you want the circuit to recognize into the microphone clearly.
The LED should blink off momentarily, this is a signal that the word has been accepted.
Continue training new words in the circuit using the procedure outlined above. Press the
"2" key then "#" key to train the second word and so on. The circuit will accept up to
forty words. You do not have to enter 40 words into memory to use the circuit. If you
want you can use as many word spaces as you want..
Recognition:
The circuit is continually listening. Repeat a trained word into the microphone. The
number of the word should be displayed on the digital display. For instance if the word
"directory" was trained as word number 25. Saying the word "directory" into the
microphone will cause the number 25 to be displayed.
Error Codes:
The chip provides the following error codes:
55 = word too long
66 = word too short
77 = word no match
36
_____________________________________________________________________
________________________________________________________________________
Chapter9. APPLICATIONS
We believe such a system would find wide variety of applications. Menu driven
systems such as e-mail readers, household appliances like washing machines, microwave
ovens, and pagers and mobiles etc. will become voice controlled in future
The robot is useful in places where humans find difficult to reach but human
voice reaches. E.g. in a small pipeline, in a fire-situations, in highly toxic areas.
Data entry
37
_____________________________________________________________________
________________________________________________________________________
Chapter10. COMPONENTS USED
38
_____________________________________________________________________
________________________________________________________________________
Chapter11. DATASHEET
HM2007
Features
39
_____________________________________________________________________
________________________________________________________________________
40
_____________________________________________________________________
________________________________________________________________________
41
_____________________________________________________________________
________________________________________________________________________
42
_____________________________________________________________________
________________________________________________________________________
43
_____________________________________________________________________
________________________________________________________________________
44
_____________________________________________________________________
________________________________________________________________________
45
_____________________________________________________________________
________________________________________________________________________
Chapter12. Project Progress Report Summary
Calendar year 2006:
June -Work started
July -Gathered useful information on voice processing techniques, microphone
properties. (Chapter 3,4)
August -We tried another chip AP7003-02, manufactured by Indian company A-plus
India. (Page 20)
September We built a voice recognition module using AP7003-02.
October Our attempts did not suceed with AP7003-02.
November- Tried to find some better alternative but finally decided to go with HM2007
and decided to get it imported from US.(Page 19,20; Chapter 11)
December- Project work was on hold.
46
_____________________________________________________________________
________________________________________________________________________
Chapter13. BIBLIOGRAPHY
Web:
www.imagesco.com/articles/hm2007/SpeechRecognitionTutorial01.html
www.migi.com for selecting motors and other robotic concepts.
www.migindia.com/modules.php?name=News&file=article&sid=22
www.datasheetcatalog.com
www.alldatasheet4u.com
http://arts.ucsc.edu/ems/music/tech_background/TE-20/teces_20.html#I. For
microphones types and properties.
www.Howstuffworks.com for understanding microphone concepts, rf radio
working and other related concepts.
Book:
Others:
Keil2 software
Aec_isp_v3
47
_____________________________________________________________________