Professional Documents
Culture Documents
TECHNOLOGY
...................................................
Thesis On
Voice Controlled Wheelchair
...................................................
Author
1. Matebie Gashu................500/02
2. Mengistu Baye................1307/03
3. Mignot Ayana..................1373/03
4. Mubarek Kebede..............1449/03
Advisor :Mr.Eniyachew
Date:-19/06/2015
Declaration of Authorship
We, Mignot Ayana, Mubarek Kebede , Mengistu Baye and Matebie Gashu,
declare that this thesis titled, ' Voice control wheelchair and the work pre-
sented in it are our own. We conrm that:
1. This work was done wholly or mainly while in candidature for a bach-
elor degree at this University.
Authors: Supervisor:
Mr. Eniyachew
Matebie Gashu
Edimealem.G
Mignot Ayana
i
Acknowledgment
We would like to thank our project supervisor, Mr. Eniyachew, for providing
the guideline with continues advices and feedback throughout the duration
of our work.
Our deepest gratitude goes to Mr. Tadie. for his great support. His valu-
able advice and constructive comments have been of great value throughout
the development of this project. We would particularly like to thank him for
his help in patiently teaching us machine learning program.
Our heartfelt thanks to Mr. Girmaw Abebe who gave us the opportunity
to work with him on this topic. His kindness and logical way of thinking
have been of great value for us. We appreciate his encouraging, guidance
and contribution which provided a good basis for the success of our work.
We would also like to express our appreciation for all Bahirdar University
institute of technology students for their support in providing sample voice
used for training and testing data for our system.
Finally, We would like to thank to all of our family members for their
understanding, encouragement and support, towards the completion of our
project.
ii
Contents
Declaration of Authorship . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Acronym . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction 1
1.1 Background Information . . . . . . . . . . . . . . . . . . . . . 1
1.2 Statement of the Problem . . . . . . . . . . . . . . . . . . . . 3
1.3 Objectives of the Project . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 General Objective . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Specic Objectives . . . . . . . . . . . . . . . . . . . . 3
1.4 Methodology Used in This Project . . . . . . . . . . . . . . . 4
1.5 Contributions of the Project . . . . . . . . . . . . . . . . . . . 5
1.6 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Limitation of Project . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Organizations of the Project . . . . . . . . . . . . . . . . . . . 6
2 Literature Review 7
2.1 Review of Controlling of Smart Wheelchair . . . . . . . . . . . 7
2.2 Review of Voice Recognition to Control a System . . . . . . . 9
3 System Design and Analysis 12
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Software part System Design and Analysis . . . . . . . . . . . 14
3.2.1 Joint Speech and Speaker Recognition Using Neural
Networks . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Arduino IDE code . . . . . . . . . . . . . . . . . . . . 27
iii
3.3 Hardware part System Design and Analysis . . . . . . . . . . 29
3.3.1 Control Unit . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Ultrasonic Sensor . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 DC Motor . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.4 H-Bridge Driver Circuit . . . . . . . . . . . . . . . . . 38
3.3.5 Power Supply . . . . . . . . . . . . . . . . . . . . . . . 44
4 Results and Discussions 45
4.1 Software simulation results and discussions . . . . . . . . . . . 45
4.2 Hardware Simulation Result and Discussion . . . . . . . . . . 49
5 Conclusion and Recommendations for Future work 51
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Recommendations for Future work . . . . . . . . . . . . . . . 52
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
iv
List of Figures
1.1 Methodology for the work . . . . . . . . . . . . . . . . . . . . 4
3.1 Block diagram for General System . . . . . . . . . . . . . . . . 13
3.2 Block diagram of voice recognition system . . . . . . . . . . . 15
3.3 Block diagram for MFCC . . . . . . . . . . . . . . . . . . . . 18
3.4 Basic representation of neuron . . . . . . . . . . . . . . . . . . 22
3.5 Simple perceptron models . . . . . . . . . . . . . . . . . . . . 22
3.6 logistic activation function . . . . . . . . . . . . . . . . . . . . 23
3.7 Forward propagation . . . . . . . . . . . . . . . . . . . . . . . 24
3.8 Flow chart of our arduino code implementation . . . . . . . . 28
3.9 Ultrasonic sensor . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.10 Timing diagram for Ultrasonic sensor . . . . . . . . . . . . . . 33
3.11 DC series motor schematic . . . . . . . . . . . . . . . . . . . . 34
3.12 Road level block diagram . . . . . . . . . . . . . . . . . . . . . 36
3.13 H-Bridge Topology . . . . . . . . . . . . . . . . . . . . . . . . 39
3.14 H-Bridge Topology - Forward direction . . . . . . . . . . . . . 39
3.15 H-Bridge Topology - Reverse direction . . . . . . . . . . . . . 40
3.16 Connection of H-bridge and motor . . . . . . . . . . . . . . . . 41
3.17 Pulse Width Modulation Used For Motor Control . . . . . . . 43
4.1 Original voice data . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 After silence and noise removal . . . . . . . . . . . . . . . . . 46
4.3 After preemphasise . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Power spectral of input voice data . . . . . . . . . . . . . . . . 47
4.5 Mel lterbank of input voice data . . . . . . . . . . . . . . . . 48
4.6 Mel frequency coecient for sampled voice data . . . . . . . . 48
4.7 General system simulation using Proteus . . . . . . . . . . . . 50
v
List of Tables
3.1 work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
vi
Acronyms
ANN ................. Articial Neural Network
ASR ................. Automatic Speech Recognition
DCT ................. Discrete Cosine Transform
DFT ................. Discrete Fourier Transform
DSP ................. Digital Signal Processing
EOG ................ Electro Oculo Graphy
FFT ................. Fast Fourier Transform
LBG ................. Linde Buzo Gray
MFCC ................ Mel Frequency Cepstral Coecient
Mic ................. Microphone
ML .................. Maximum Likelihood
MOSFET........... Metal Oxide Semiconductor Field Eect Transistor
MSE ................. Mean squared error
NN .................. Neural Network
RAM ................. Random Accesses Memory
SVM ................. Support Vector Machine
VQ .................. Vector Quantization
vii
Abstract
This project is related to the Voice Controlled Wheelchair System by using
speech recognition module. The system is designed to control a wheelchair
using the voice of user. The objective of this project is to facilitate the move-
ment of people who are disabled or handicapped and elderly people who are
not able to move well. The result of this design will allow certain people to
live a life with less dependence on others. Speech recognition technology is
a key which may provide a new way of human interaction with machines or
tools. Thus the problem that they are faced can be solved by using speech
recognition technology to move the wheelchair. This can be realized with
used the microphone as an intermediary.
The project has consisted of the following parts: Hardware, software, in-
terface. The hardware part design of this project is consists Arduino uno,
H-bridge motor driver, ultrasonic sensor, Direct current motor. The soft-
ware part also consists speech and speaker recognition system and arduiono
(c code) for control.speech and speaker recognition system implement us-
ing Matlab . and interfacing the software and hardware by creating serial
communication between our Laptop and arduino uno using USB cable. The
results and analysis of this project will describe in this report.
The results of this project show that this project can be used for future
research works and to design excellence project that meets market need and
public interest.
viii
Chapter 1
Introduction
Today, more and more people are suering from losing their hands and legs.
Wheelchair is a very important necessity for the handicapped. Many or-
ganizations intend to design and create a more convenient wheelchair for
the handicapped. There are researches which use brainwave, eye movement,
eye blinking and many others method to move the wheelchair. Most of the
wheelchairs available in the market are self-controlled wheelchair or joystick
controlled wheelchair. These types of wheelchairs are only suitable for people
who have their hands to control the wheelchairs. For a person who lost their
hands and legs, they still have their voice. They can use their voice to give
commands to move the wheelchair. That is the motivation of as to do design
voice control wheelchair.
In order to benet the end users who have lost control of their upper
extremities due to injury, illness or disability, a speech recognition system is
designed to be one of the steering control components of the embedded sys-
tem [1]. The user can communicate with this type of system by giving voice
command. This voice command is predened for each operation supported
by the wheelchair.
1
given.
Speech is probably the most ecient way to communicate with each oth-
er. This also means that speech could be a useful interface to interact with
machines. Voice recognition is a computer, software program, or hardware
device's ability to decode the human voice into digitized speech so it can be
understood by the computer.
Speech recognition can be any kind like speaker independent speaker de-
pendent continues word isolate word speech recognition and so on in our
case we use speaker dependent (using joint speaker and speech recognition)
isolated-word speech recognition system. It has been noticed that the success
of isolated-word automatic speech and speaker recognition systems requires
a combination of various techniques and algorithms and steppes like speech
(data) collection, preprocessing, feature extraction and last but the most
work is neural network based multi class classication. Neural networks are
composed of simple computational elements operating in parallel. The net-
work function is determined largely by the connections between elements.
2
1.2 Statement of the Problem
3
1.4 Methodology Used in This Project
4
1.5 Contributions of the Project
1. It make life easy which means decibel peoples can use the wheelchair
which operated by voice command and they enjoy their life.
2. Energy Savings: a voice controlled while chair system would potentially
be able to make sure that, one can use without external interference
the human powered.
3. In addition the speech recognition system also used for other voice
operated system bay changing the training sate.
The scope of the project is designing the voice controlled wheelchair using
speech and speaker recognition system.
Develop code for isolated word speech and speaker recognition which rec-
ognize the word "jemir","kum","wode kegni", "wode gira" and "wode hoala"
and the speaker of this isolated word.
Designing the control part of the wheelchair chassis prototype using ar-
duino uno, H-bridge motor driver, and DC motor nally control the move-
ment of the wheelchair.
5
3. We do the hardware prototype properly
4. Our system nd more memory because we use two systems in one which
is speaker and speech recognition
5. We are not use machine vector algorithm which is more ecient speech
algorithm
6
Chapter 2
Literature Review
In this chapter we tried to review litratures that related to speech recognition
system and smart wheelchair which are the basic part of our project.
7
2. Finger Movement Tracking Wheelchair: In this system, the wheelchair
can be controlled by a combination of three ngers. One nger controls
the speed of the wheelchair while the other two ngers control the di-
rections of the wheelchair. There is
exibility that we can use only two
ngers instead of three to control the direction while letting the system
to control the speed automatically. For the purpose of tracking the
movement of ngers, the
ex-sensors are used. The nger movement
tracking system revolves around
ex sensor. Flex sensor is basically a
resistive strip whose resistance is directly proportional to bending. The
ex sensor is worn on the nger and the bending of the nger, in turn,
bends the resistive strip thereby increasing the resistance of the strip.
This change of resistance is used in generation of command signals This
system also like the above used for partially persons in which his/her
hand is safe [4].
3. Eye tracking based wheelchair: When the case is worse all other tech-
nique becomes useless. In some critical cases, in which the person is
unable to move the parts of the body even unable to speak in such
cases eye controlled wheelchair can be very eective. In this case, the
dierent commands for the wheelchair are derived from the electro
oculo graphy (EOG) potential signals of eye movements. A system
for electric wheelchair control using the eyes was proposed in 2007. A
commercially available web camera on a head-mounted display (HMD)
which the user wears is used to capture moving pictures of the users
face. A computer mounted on the electric chair processes the captured
image data, detecting and tracking movements of the users eyes, esti-
mating the line-of-sight vector, and actuating the electric wheelchair in
the desired direction indicated by the users eyes. This system is good
but not ecient [5].
8
2.2 Review of Voice Recognition to Control
a System
Voice Signal Identication consist of the process to convert a speech wave-
form into features that are useful for further processing. There are many
algorithms and techniques are use. It depends on features capability to cap-
ture time frequency and energy into set of coecients for cepstrum analysis.
Then we try to review the algorithm used for voice recognition and the related
system.
1. Controlling of device through voice recognition using MATLAB: In this
paper a technique is described in which rstly a speech command can
be determined by power of speech signal which can be taken by the
help of microphones being connected to the computer itself. Using
MATLAB Programming the sampling of the speech signal takes place
with sampling rate of 8000 samples/sec according to nyquist criteria
i.e.
F = 2 fm
Where: F=Sampling Frequency, fm = maximum component of fre-
quency being present in speech signal.
The sampled signal is then ltered o by using band pass lter lying
in the range of 300 Hz-4000 Hz, which lters all the speech signal lying
below 300 Hz of frequency range. Moreover, it includes the algorithm
for the creation of speech templates which can be achieved by calculat-
ing the power of each sampled signals respectively [6].
9
HM2007 and a CMOS static RAM64K. The static Random Access
Memory (RAM) use to store the voice commands while the function of
HM2007 is use to capture up to 20 words in the length of 1.92 seconds.
The voice recognition circuit automatically operate when it detect the
existence of human. However, the circuit could not capture the com-
mands clearly. From the project, it can be concluded that this project
was successfully carried out. However, there are some points that can
be improved such as to design a system which can capture the voice
clearly [8].
10
(VQ) Approach: This paper presents an approach to speaker recog-
nition using frequency spectral information with Mel frequency for the
improvement of speech feature representation in a Vector Quantization
codebook based recognition approach. The Mel frequency approach
extracts the features of the speech signal to get the training and test-
ing vectors. The VQ Codebook approach uses training vectors to form
clusters and recognize accurately with the help of Linde Buzo Gray
(LBG) algorithm. The LBG algorithm is used for clustering a set of L
training vectors into a set of M codebook vectors [11].
7. Automatic Vowel Classication in Speech: it uses An Articial Neural
Network Approach Using Cepstral Feature Analysis. The neural net-
works designed by the authors are trained and tested on data prepro-
cessed using Mel frequency cepstral coecient (MFCC) feature analy-
sis. Thus the role of the neural network here is to provide a mapping
from 13-dimensional MFCC space the space of vowel phonemes . Vari-
ations in the number and characteristics of vowels used for these pur-
poses are explored. it examine several articial nueral network (ANN)
types and architectures in addition to various training algorithms used
for each of type. After preliminary research, it train and test two types
of neural networks [12].
8. Neural Networks used for Speech Recognition: In this paper is present-
ed an investigation of the speech recognition classication performance.
This investigation on the speech recognition classication performance
is performed using two standard neural networks structures as the clas-
sier. The utilized standard neural network types include Feed-forward
Neural Network (NN) with back propagation algorithm and a Radial
Basis Functions Neural Networks [13].
11
Chapter 3
System Design and Analysis
3.1 Introduction
This project describes a wheelchair which can be control only by using the
user's voice. This project aims to facilitate the movement of the handicapped
people and elderly people who can't move properly then enable them to lead
better lives without any problem. This project consists both software part
and hardware design part. The software part of our project is the speech and
speaker recognition system and the hardware part is intelligent wheelchair.
1. Moving forward
2. Moving backward
3. Turning to right
4. Turning to left
5. Stop
The general system block diagram is shown in Figure 3.1, the user of
wheelchair gives his/her voice as input in order to drive the wheelchair to
12
the desired position.
Microphone (Mic) which converts the voice signal to the electric signal
and the signal is given to the voice recognition module.
After receiving the last prediction from our laptop The Arduino will take
the decision to move forward or backward or left or right with the help of
H-Bridge (motor driver) unit.
In the next section we try to discuss system design and analysis for the
software and hardware part brie
y.
13
3.2 Software part System Design and Analy-
sis
Input Voice
We record the voice data from Bahirdar University students for each com-
mand (classied) like "Wede Holla", "Jemir", "Kum", "Wede Gira" and
"Wede Kegni" separately by using Audacity voice recorder software in Wav
format for speech recognition. Also for speaker recognition we record dif-
14
Figure 3.2: Block diagram of voice recognition system
ferent students as one classied and one of our group member is another
classied. and it can be used for training and testing the speaker and speech
recognition system separately.
Other steps we discussed in this section is used for both speaker and speech
recognition system.
Signal Pre-processing
Pre-processing of speech signal plays a very important part, especially in
the applications where silence or background noise is completely undesir-
able. This step is necessary to make sure that the signal is less susceptible
to noise.
In automatic speech recognition(ASR) silence and background noise is unde-
sirable and it decrease system accuracy then here we do to remove the silence
and background noise for each voice data.
15
Pre-processing generally involves silence removal with endpoint detection and
Pre-emphasis ltering.
Silence removal: it removes the silence intervals from the input speech
based on an envelope threshold. The input signal is up-sampled, segmented
to remove samples that fall below a threshold, and then re-sampled back
to the original sampling rate, and ltered to smooth out the discontinuities
where pauses in active speech occurred. The threshold used here is one-
fourth of the median of the envelope.
16
and low frequencies imply low zero-crossing rates, there is a strong correla-
tion between zero-crossing rate and energy distribution with frequency. A
reasonable generalization is that if the zero-crossing rate is high, the speech
signal is unvoiced, while if the zero-crossing rate is low, the speech signal is
voiced.
The amplitude of the speech signal varies with time. Generally, the am-
plitude of unvoiced speech segments is much lower than the amplitude of
voiced segments. The energy of the speech signal provides a representation
that re
ects these amplitude variations. Short-time energy can dene as:
En =
X1
[x(m)w(n m)]2
m= 1
The choice of the window determines the nature of the short-time energy
representation. In our model, we used Hamming window. The hamming
window gives much greater attenuation outside the bandpass than the com-
parable rectangular window.
17
Figure 3.3: Block diagram for MFCC
In this step the continuous speech signal is blocked into frames of N sam-
ples, with adjacent frames being separated by M (M < N ). The rst frame
consists of the rst N samples. The second frame begins M samples after
the rst frame, and overlaps it by N M samples and so on. This process
continues until all the speech is accounted for within one or more frames.
Typical values for N and M are N = 256 (which is equivalent to 30msec
windowing and facilitate the fast Fourier transform(FFT)) and M = 100.
Step 2: Windowing
18
The next processing step is the Fast Fourier Transform, which converts
each frame of N samples from the time domain into the frequency domain.
The FFT is a fast algorithm to implement the Discrete Fourier Transform
(DFT), which is dened on the set of N samples, as follow:
P j 2kn
Si (k) = Nn=1 si (n)h(n) exp N ,1 <= k <= K
where h(n) is an sample long analysis window (hamming window), and
K is the length of the FFT, si (n) is time domain signal.
The periodogram-based power spectral estimate for the speech frame
si (n) is given by:
1
Pi (k) = [Si (k)]2
N
The result after this step is often referred to as spectrum or periodogram-
s. then from this we can calculate periodogram based power spectral density.
19
where M is the number of lters we want, and f () is the list of M + 2
Mel spaced frequencies.
Step 5: Discrete cosine transform
Take the logarithm of each lter bank energies from step 4. this leaves
us with log lter bank energies. nally we take discrete cosine transform
(DCT) of the log lter bank energies it gives the resulting feature are called
Mel frequency cepstral coecients.
20
Speech and Speaker Classication Using Neural Network
In machine learning, articial neural networks (ANNs) are a family of statis-
tical learning algorithms inspired by biological neural networks (the central
nervous systems of animals, in particular the brain) and are used to estimate
or approximate functions that can depend on a large number of inputs and
are generally unknown. Articial neural networks are generally presented as
systems of interconnected "neurons" which can compute values from inputs,
and are capable of machine learning as well as pattern recognition .
The aim is in most cases not to model the human brain as exactly as
possible but to obtain highly eective machine learning systems
The basic element of an articial neural network is the neuron; it is the
fundamental information-processing unit in a neural network. It consists of
three main parts (Haykin, 1999): A set of synapses or connecting links, an
adder and an activation function. The input signals "Xi " are connected to
the neuron by the synapses, each of which has a special weight "Wi " of its
own. The so weighted input signals are than summed by the adder. Often
also an external bias "b" is applied to the adder increasing or decreasing the
output signal of the adder. Its amplitude is then limited by the activation
function to a permissible nite amplitude range. An illustration of a neuron
can be seen in Figure 3.4
Model Representation
The simple neuron model is made from studies of the human brain neurons.
A neuron in the brain receives its chemical input from other neurons through
its dendrites. If the input exceeds a certain threshold, the neuron res its
own impulse on to the neurons it is connected to by its axon. Below is a very
simplied gure as each of the neurons of the brain is connected to about
10000 other neurons.
The simple perceptron models this behavior in the following way. First
21
Figure 3.4: Basic representation of neuron
the perceptron receives several input values (x0 xm ). The connection for
each of the inputs has a weight (w0 wn ) in some range. Threshold Unit
then sums the inputs, and if the sum exceeds the threshold value, a signal is
sent to output. Otherwise no signal is sent.
22
gures below. They are commonly used in networks trained with backprop-
agation. The networks referred to in this project are generally backpropaga-
tion models and they mainly use log-sig activation functions. Like shown in
the gure 3.6
1
g(z ) =
1 + exp( z )
23
the weight, the more in
uence one unit has on another. (This corresponds to
the way actual brain cells trigger one another across tiny gaps called synaps-
es.)
Our neural network is shown in Figure 3.7. It has 3 layers an input layer,
a hidden layer and an output layer. Recall that our inputs are feature of
the speech. Since the size of the features are of size 20 MFCC coecient
and 20 delta coecient, this gives us 40 input layer units (not counting the
extra bias unit which always outputs +1). The training data will be loaded
into the variables X and y by the mean. Mat script. it is for both speech
and speaker recognition the only dierence between them is the output layer.
The output layer for speech recognition is ve(5) means the ve possible
command that used to control the movement of wheelchair. and for speaker
it is only two output layer because it classify the user speech from other
persons speech to classify the user voice.
Forward propagation
24
a1 = X
Z 2 = 1 a(1)
a2 = g(Z 2 )
Z 3 = 2 a2
a3 = g(Z 3 )
h(x) = a3
we must add a10 at a1 and a20 at a2 before we compute z 2 and z 3 here a3 = h(x)
which is the expected experiment the cost of the system become like below.
J () = 1=m[
XXy
m k
(i)
log(h (xi ))k +(1 yki ) log(1 h (xi ))k )]+=2m
X X X (
L 1 si si +1
)
i 2
k ji
i=1 k=1 l=1 i=1 j =1
Back propagation
@J ()
@ijl
In this part of the work, We will implement the back propagation algo-
rithm to compute the gradient for the neural network cost function. We will
need to complete the cost function so that it returns an appropriate value
for grad. Once we have computed the gradient, we will be able to train the
neural network by minimizing the cost function using an advanced optimizer
such as fmincg.
Now, you will implement the back propagation algorithm. Recall that the
intuition behind the back propagation algorithm is as follows. Given a
training example (x(t); y (t)), we will rst run forward pass" to compute all
the activation's throughout the network, including the output value of the
25
hypothesish (x). Then, for each node j in layer l, we would like to compute
an error term "(l)j that means uses how much that node was responsible"
for any errors in our output.
In this stage of neural network operation the model of the neural network
is modeled In our case we have three layer neural network the number of
layer is determined based on the complexity of the system and the eciency
using try and error.
Input units: number of dimensions x (dimensions of feature vector)
Output units: number of classes in classication problem
Hidden units
Default might be
1. One hidden layer
2. Should probably have
(a) Same number of units in each layer in our case 25 for both speech
and speaker
(b) Or 1.5-2 x number of input features
(c) But more is more computational expensive
Training a neural network
26
2. Implement forward propagation
a1 = X
Z 2 = 1 a(1)
a 2 = g (Z 2 )
Z 3 = 2 a2
a 3 = g (Z 3 )
h(x) = a3
3. Implement code to compute cost function
h (x)Rk (h((x)))i = ith output
J ( ) = 1=m[
XXy
m k
(i)
log(h (x ))k +(1 y ) log(1 h (x ))k )]+=2m
i i i
X X X (
L 1 si si +1
)
i 2
k k ji
i=1 k=1 l=1 i=1 j =1
matlab code role is speech processing which is voice and speaker recogni-
tion the mat lab do its roll and it send serial data to the Arduino by using
27
Arduino arduinomatlab interface and Arduino which is hart of the control
system accepts signal value from the sensors and from serial port which is
from mat lab (or the command which is recognized) and it take decision de-
pend on the value of the sensor and serial port value. The
ow chart of our
program is like below.
28
3.3 Hardware part System Design and Anal-
ysis
The Arduino platform has become quite popular with people just start-
ing out with electronics, and for good reason. Unlike most previous pro-
grammable circuit boards, the Arduino does not need a separate piece of
hardware (called a programmer) in order to load new code onto the board
you can simply use a USB cable. Additionally, the Arduino IDE uses a
simplied version of C++, making it easier to learn to program. Finally,
Arduino provides a standard form factor that breaks out the functions of the
micro-controller into a more accessible package.
29
cation with matlab. Arduino accept the last prediction of the neural network
then based on prediction we set the condition that is simple IDE code and
upload this code from arduino IDE software to arduino uno board using USB
cable after that we can control the movement of motor.
Ultrasonic waves are sounds which cannot be heard by humans and are
normally, frequencies of above 20kHz. The basic characteristics of ultrasonic
waves are explained below.
30
regular re
ection.
31
Figure 3.9: Ultrasonic sensor
1. 5V Supply
2. Trigger Pulse Input
3. Echo Pulse Output
4. 0V Ground
Timing diagram
The Timing diagram is shown in Figure 3.10. You only need to supply a short
10S pulse to the trigger input to start the ranging, and then the module will
send out an 8 cycle burst of ultrasound at 40 kHz and raise its echo. The Echo
is a distance object that is pulse width and the range in proportion .You can
calculate the range through the time interval between sending trigger signal
and receiving echo signal.The formula is as follows: S=58 = centimeters or
S=148 = inch4 or T heRange = highLevelT ime velocity(340M=S )=2;
32
Figure 3.10: Timing diagram for Ultrasonic sensor
3.3.3 DC Motor
Movement in wheelchair is done by motor. Motor is responsible for movement
of wheel chair. Motor will be selected depending upon requirement .i.e.
maximum load, starting torque, voltages ratings, current ratings, etc.
In this project two Series DC motors were used. Series motors are com-
monly used as traction motors in many applications, as they oer high start-
ing torque, are robust, have a simple design and are relatively low cost. Most
of their applications are of an industrial nature, such as conveyors, but are
also common in road-going electric vehicles. DC series motors are an ideal
choice for battery-operated equipment over AC motors, as they dont require
the use of expensive complicated inverter circuitry to convert the DC voltage
to an AC voltage required by the motor.
Advantages of DC motors
1. Huge starting torque
2. Simple Construction
3. Designing is easy
4. Maintenance is easy
5. Cost eective
33
Operation of Series DC Motor
Operation of the series motor is easy to understand. In Figure 3.11 you can
see that the eld winding is connected in series with the armature winding.
This means that power will be applied to one end of the series eld winding
and to one end of the armature winding (connected at the brush).
34
would be changing the polarity of both eld and armature windings and the
motor's rotation would remain the same.
Ea = Vt I a ( Ra + Rf )
Where: Ea is armature voltage (volt) Vt is the total voltage of DC motor Ia
is the armature current (Ampere) Ra is armature resistance (ohm) Rf is the
eld resistance(ohm)
P d = Ia E a
Where: Pd is the electrical power developed inside the motor
e = Pd =!m
Where: e is the electric torque developed inside motor(N.m) !m is the ro-
tational speed of the motor (r.p.m)
!m = 2 (numberofrotationperminute)=(60second=minute)
po = Pd Pr
Where: Po is the mechanical output power of the motor(watt). Pr is the
rotational power losses(%).
s = (Pd Pr )=!m
s is the mechanical torque on the shaft.
Gear Box calculation
By using a gear box with a step down ratio then
!g = Stepdownratio !m
35
where: !g is the rotational speed of the gear(rpm) !m is the rotational speed
of the motor(rpm)
And
g = Po =!m
Where: g is the torque of the gear.
g = 9:81m=s 2
36
Maximum angle of inclination: max = 37 deg. According to the inter-
national laws for transportation the maximum slope angle should not
exceed 37 deg. We are not sure if this law is respected in Ethiopia, but
it is denitely respected in the large cities of the country.
Coecient of friction: = 0:5 0:7, we will assume the value max =
0:7, to account for the worst possible conditions.
Wheel Radius:R = 40cm = 0:40m
Wheel perimeter: Pwheel = 2:512m
Assuming the required acceleration: ax = 1m=s2
The average velocity of the wheelchair is: Vav = 5km=h = 1:39m=s
R = W cos(37) = 1175:19N
Friction force
W x = W sin(37) = 885:57N
At equilibrium
XF =F
fx W x = 0
x
37
Propulsion force:
F = 1858:17N
Torque at the wheel
T = F R = 1858:17 0:15 = 278:72N:m
Calculation of rpm
V = 5000=60 = 83:33m=min
Rpm = V=Pwheel = 33:17rpm
T = 278:72N:m = 205lb ft
HP = (T rpm)=5252 = (205 33:17)=5252 = 1:29
38
Figure 3.13: H-Bridge Topology
39
Figure 3.15: H-Bridge Topology - Reverse direction
40
The general connection of H-bridge and motor is shown as follows :
41
the other half.
Duty Cycle: The duty cycle is analogous to how long the upper switch
(switch 1) remains on as a percentage of the total switching time. In essence
it is an average of how much power is being delivered to the motor. Duty
cycle gives the proportional speed control of the motor. Figure 3.17. is an
42
example of 1/4, 1/2, and 3/4 duty cycles. Eectively, these duty cycles would
run the motor at 1/4, 1/2, and 3/4 of full speed respectively.
Advantage of PWM
The main advantage of PWM is that power loss in the switching devices is
very low. When a switch is o there is practically no current, and when it
is on, there is almost no voltage drop across the switch. Power loss, being
the product of voltage and current, is thus in both cases close to zero. PWM
works also well with digital controls, which, because of their on/o nature,
can easily set the needed duty cycle. Using pulse width modulation has
several advantages over analog control. Using PWM to dim a lamp would
produce less heat than an analog control that converts some of the current
to heat. Also, if you use PWM, the entire control circuit can be digital, elim-
inating the need for digital-to-analog converters. Using digital control lines
will reduce the susceptibility of your circuit to interference. Finally, motors
may be able to operate at lower speeds if you control them with PWM. When
you use an analog current to control a motor, it will not produce signicant
torque at low speeds. The magnetic eld created by the small current will be
43
too weak to turn the rotor. On the other hand, a PWM current can create
short pulses of magnetic
ux at full strength, which can turn the rotor at
extremely slow speeds.
44
Chapter 4
Results and Discussions
Matlab(R2010a) has been used in this work. The voice recorded using au-
dacity voice recorder software from bahirdar university students for each ve
commands and for speaker identication that were classify by our system.
For speech recognition the commands that mention above is given orderly
like this "wede hola", "jemir", "kum", "wede gira" and "wede kegni". Then
for each command we record 50 voices generally 250 voices recorded for ve
commands. from this 200 can be used for training data and the other 50
were reserved for testing data.
For speaker recognition we record 250 voices for the user of the wheelchair
and other many speakers. From this 200 can be used for training and the
other 50 voices reserved for testing data.
45
Figure 4.1: Original voice data
After this the data is must be preproccesed before further steps of ASR
as we have discussed earlier. In this step silence and noise removal, preem-
phasise, zero crossing rate and energy of the input voice is done. The time
domain signal after silence and noise removal is shown in gure 4.2 and after
preemphasis shown in gure 4.3 respectively.
After preprocessing the input voice data the next step is feature extrac-
tion for each data that is Mel frequency cepstral coecient. To do this the
next step is taken as we have mention above.
The rst step is frame blocking, windowing and fast Fourier transform by
selecting the standard N = 256 point of FFT, M = 100 over lapping of each
frame and hamming window.
46
Figure 4.3: After preemphasise
In the next step we take the absolute value of the complex Fourier trans-
form and square the result. we would generally perform a 256 point FFT and
keep only the rst 128 coecient. this is called the periodoggram estimate
of the power spectrum.
for the sample voice data power spectrum is shown in the gure 4.4
The third step is compute the mel lterbank. Here we select 20 triangular
lters. then our lterbank comes in the form of 20 vectors of length 128. for
the sample voice data mel lterbank is shown in the gure 4.5
47
Figure 4.5: Mel lterbank of input voice data
The fourth step is take the discrete cosine transform of each 20 log lter-
bank energies to give 20 cepstral coecients. the resulting 20 numbers are
called Mel frequency cepstral coecient(MFCC) it shown in the gure 4.6
the last step for MFCC calculation that we have done is calculate deltas
to increase ASR performance. now we combine 20 deltas coecient vector
on 20 MFCC vectors then the total feature vector of length 40 is generate.
Finally, Neural Network was used to create, train and simulate the net-
works and mean square error was used to evaluate its performance. Training
a neural network is very important. We train a neural network so that a
particular input leads to a specic target output. We have used 40 data set
per command for training the data. These samples are dierent from the
48
data used for testing.
When this neural network is run we see its performance. The mean
squared error (MSE) is a network performance function. It measures the
network's performance according to the mean of squared errors. It is the
average squared dierence between outputs and targets.
Then for our case MSE of speech recognition system is become 97.50 for
training data and 88.00 for test data and MSE of speaker recognition system
is become 99.00 for training and 94.00 for testing data.
Here we try to discuss the result that we have get in the hardware part.
First we try to simulate the hardware part that start from arduino uno to
DC motor to see the movement of two DC motor that is the main controller
of our automation wheelchair movement by using Proteus 8.1 it is a virtual
system modeling and circuit simulation application. The simulation is shown
in Figure 4.7.
The above circuit is motor control system design stimulation. The LDR
in Arduino A0 is act as ultrasonic. if the value of the LDR go above the
reference value the Arduino stoop the motor which means there is obstacle
there. And the alarm and light (fracha) going to high.
The other component in the design is H bridge which is used for direction
control here we control two motors so four pins from the Arduino give logical
value to its which means when we give serial data by using virtual terminal
for direction value the for pins which is the H bridge state control pin the
motor change its direction.
The last but not the least thing we must see is the speed of the motor is
given from Arduino PWM pins and the speed can be controlled by varying
the duty cycle value of the pins from value given from serial port
49
Figure 4.7: General system simulation using Proteus
50
Chapter 5
Conclusion and
Recommendations for Future
work
5.1 Conclusion
51
5.2 Recommendations for Future work
From our project we observe in fully that voice controlled wheel chair is
Possible so we put works for the feature from this the following is included
52
Reference
1. Daoudi, K. (2002) Automatic Speech Recognition:The New Millenni-
um, Proceedings of the 15th International Conference on Industrial and
Engineering, Applications of Articial Intelligence and Expert Systems:
Developments in Applied Articial Intelligence, 253-263.
2. Bourhis G, Moumen K, Pino P, Rohmer S, Pruski A. Assisted navi-
gation for a powered wheelchair. Systems Engineering in the Service
of Humans: Proceedings of the IEEE International Conference on Sys-
tems, Man and Cybernetics; 1993 Oct 1720; Le Touquet, France. Pis-
cataway (NJ): IEEE; 1993. p. 55358.
3. V.I. Pavlovic, R. Sharma, And T.S. Huang, Visual Interpretation Of
Hand Gestures For Human-Computer Interaction: A Review, In IEEE
transactions On Pattern Analysis And Machine Intelligence, July 1997,Vol.
19, Pp. 677-695.
4. Fahad Wallam And Muhammad Asif, August 2011, InInternational
Journal Of Computer And Electrical Engineering, on: Dynamic Finger
Movement Tracking And Voice Commands Based Smart Wheelchair,
Vol. 3, No. 4.
5. Arai K, Purwanto D, December 2007, In 7th International Confer-
ence On Optimization: on Techniques And Applications of Electric
WheelchairControl with the Human Eye. Kobe, Japan
6. Topic on "Controlling of Device through Voice Recognition Using Mat-
lab1" ISSN NO: 2250- 3536 VOLUME 2, ISSUE 2, MARCH 2012
http : ==www:ijater:com=F iles=fd54f 0ee 1aae 4e88 ab81
b89f 32ac57c6I JAT ER0 33 5:pdf
53
7. Topic on "Speech Recognition using Digital Signal Processing" ISSN:
2277-9477, Volume2, Issue 6.
8. Onn, C. M. (2010). "Voice Recognition Home Automation System"
Faculty of Electrical Engineering. Universiti Technologi Malaysia, Uni-
versiti Technologi Malaysia. Bachelor of Electrical(Electronic)Engineering:50.
9. Kadir, M. S. B. A. (2010). "Voice Activated Switching Device" Facul-
ty of Electrical Engineering. Universit Teknologi Malaysia, Universiti
Teknologi Malaysia. Bachelor of Electrical (Electronic) Engineering:
64.
10. Chadawan Ittichaichareon, Siwat Suksri and Thaweesak Yingthaworn-
suk. "Speech Recognition using MFCC". International Conference
on Computer Graphics, Simulation and Modeling (ICGSM'2012) July
28-29, 2012 Pattaya (Thailand)
11. Deepak Harjani, Mohita Jethwani, Ms. Mani Roja. "Speaker Recogni-
tion System using MFCC and Vector Quantization Approach" Interna-
tional Journal for Scientic Research and Development| Vol. 1, Issue
9, 2013 | ISSN (online): 2321-0613
12. Peter Merkx, Jadrian Miles (2005). "Automatic Vowel Classication in
Speech An Articial Neural Network Approach Using Cepstral Feature
Analysis" ,Department of Mathematics , Duke University, Durham,
NC, USA
13. Wouter Gevaert, Georgi Tsenov, Valeri Mladenov. "Neural Networks
used for Speech Recognition". JOURNAL OF AUTOMATIC CON-
TROL, UNIVERSITY OF BELGRADE, VOL. 20:1-7, 2010
14. Vimala.C, Dr. V.Radha, A Review on Speech Recognition Challenges
and Approaches, World of Computer Science and Information Technol-
ogy Journal, Vol. 2, No. 1, pp. 1 -7, 2012.
15. http : ==www:lmphotonics:com=DCSpeed=seriesd c:htm
54
Appendices
Appendix A
DR = dir;
voice = ;
for i=3:length(DR)
if DR(i).isdir
voice = [voice DR(i).name];
end
end
55
DR(s).name;
fs=Fs;
fOut
= zcrs tes o(data; F s);
data=fOut;
melCeps
=MfccCalculation(data,fs);
m=size(mfcc,1);
XMean = repmat(mean(mfcc),m,1);
XVariance = repmat(var(mfcc),m,1);
mfcc = (mfcc - XMean)./(XVariance);
56
meanmfcc(1:5,c) = mean(mfcc(1:5,rst:last)')";
varmfcc(1:5,c) = var(mfcc(1:5,rst:last)')';
end
VV=[meanmfcc varmfcc];
BB=VV(:)';
Exf eatures:MF CC = BB ;
for j =2
y =[y j*g];
m=y;
end
y =m';
save('FinalVoiceData.mat','X','y','-v7');
57
clear ; close all ; clc
58
fprintf('sigmoid gradient n');
59
tJ = nnCostF unction(optimizep arams; inputLayerSize; hiddenLayerSize; :::
numberOfLabels; tX ; ty ; lambda);
fprintf (0 CostatT EST Setis :
Fs=16000; ch=2;
fprintf ('pls enter to start ');
pause;
voice = wavrecord(2*Fs , Fs , ch ,'double');
sound(voice,Fs);
gure
plot(voice)
60
[fOut] = zcrs tes o(voice; F s);
m=size(mfcc,1);
XMean = repmat(mean(mfcc),m,1);
XVariance = repmat(var(mfcc),m,1);
mfcc = (mfcc - XMean)./(XVariance);
numframesintexturewindow=
oor(length(mfcc)/4);
for k=1:4
rst = (k-1)*numframesintexturewindow + 1;
last = rst + numframesintexturewindow - 1;
MM=[meanmfcc varmfcc];
PP=MM(:)';
X=PP;
load('newWeightforspeaker2.mat');
load('newWeightforspeech2.mat');
61
pred = predict(Theta1,Theta2,X);
if pred==1
predd=predict(ThetaS1,ThetaS2,X);
else
fprintf('the voice enter here is not recognize becuase it is not the user voice');
end
fprintf(arduino,0
fclose(arduino);
Appendix B
=thisismotorcontrollprogramwhichisusedtocontrolyhewheelchairusingl293dmotordiriver
*/
intm1speed = 9;
// digital pins for speed control for motor one
intm2speed = 11;==digitalpinsforspeedcontrolformotortwo
intm1direction = 10;==digitalpinsfordirectioncontrolmotorone
intm12direction = 7;
intm21direction = 4;==digitalpinsfordirectioncontrolmotortwo
intm22direction = 3;
intm22 = 13;
intobstacle = A0;==forobstaclepin
void setup()
Serial.begin(9600);
pinMode(m1direction, OUTPUT);
pinMode(m12direction, OUTPUT);
pinMode(m21direction, OUTPUT);
pinMode(m22direction, OUTPUT);
62
pinMode(m1speed, OUTPUT);
pinMode(m2speed, OUTPUT);
pinMode(obstacle, INPUT);
//pinMode(pot, INPUT);
pinMode(m22, OUTPUT);
if (val<=455)
goto normal;
else
goto stopp;
normal: if (Serial.available() > 0) // is a character available?
rxb yte = Serial:read();
Serial:println(rxb yte);
switch(rxb yte)case0 00 :
Serial.println("||- MENU ||-");
Serial.println("1. goForward.");
Serial.println("2. goBackward.");
Serial.println("3. rotateRight.");
Serial.println("4. rotateLeft.");
Serial.println("5. stop.");
Serial.println("||||||{");
case '1':
63
digitalWrite(m1direction,HIGH); // forward
digitalWrite(m12direction,LOW);
digitalWrite(m21direction,HIGH); // forward
digitalWrite(m22direction,LOW);
Serial.println("FORWARD");
//delay (2000);
break;
case '2':
digitalWrite(m1direction,LOW); // forward
digitalWrite(m12direction,HIGH);
digitalWrite(m21direction,LOW); // forward
digitalWrite(m22direction,HIGH);
Serial.println("BACK");
//delay (2000);
break;
case '3':
digitalWrite(m1direction,HIGH); // forward
digitalWrite(m12direction,LOW);
digitalWrite(m21direction,LOW); // forward
digitalWrite(m22direction,LOW);
Serial.println("rotateRight");
//delay (2000);
break;
case '4':
digitalWrite(m1direction,LOW); // forward
digitalWrite(m12direction,LOW);
digitalWrite(m21direction,LOW); // forward
digitalWrite(m22direction,HIGH);
Serial.println("rotateLeft");
//delay (2000);
break;
case '6':
potval=255;
64
break;
case '7':
potval=190;
break;
case '8':
potval=90;
break;
stopp: case '5':
digitalWrite(m22,HIGH);
delay (100);
digitalWrite(m22,LOW);
digitalWrite(m1direction,LOW); // forward
digitalWrite(m12direction,LOW);
digitalWrite(m21direction,LOW); // forward
digitalWrite(m22direction,LOW);
analogWrite(m1speed,0);
analogWrite(m2speed,0);
Serial.println("||- MENU ||-");
Serial.println("1. goForward.");
Serial.println("2. goBackward.");
Serial.println("3. rotateRight.");
Serial.println("4. rotateLeft.");
Serial.println("5. stop.");
Serial.println("||||||{");
break;
st: default:
Serial.println("Invalid option");
break;
//delay(2000);
// end: switch (rxb yte)
==end : if (Serial:available() > 0)
65