Professional Documents
Culture Documents
net/publication/228799500
Article
CITATIONS READS
0 220
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Khairul Nizam on 29 March 2015.
Abstract- Voice is one of convenient methods to communicate text. Then communication between the main processes with
between human and robots. To command a robot task by voice, our microcontroller happen using serial data transfer.
voices of the same number have to be able to be recognized But
the more the number of recognition voice is, the higher the cost II. ARCHITECTURE
of recognition system is and the longer the time of recognition is.
A large class of problems requires real-time processing of
complex structured data in real-time. These are difficult The speech SDK5.1 SAPI can be built in by using visual c++.
problems to solve by state logic and compare techniques, since The system uses Speech SDK5.1, a development component
they require capturing complex structures of voice and for the Speech procedures provided by the Microsoft
relationships in massive quantities of low precision, ambiguous Corporation .In this section, we will provide an overview of
noisy data. In this paper, we develop a master processing unit
Speech SDK5.1, starting with an introduction of HMM-based
that will process voice recognition which is a notebook. It will
give command using serial data transfer to a microcontroller for recognizers.
further processing for remote navigate mobile Platform.
Nonholonomic mobile robot like car is used for base. The aim is Overview of an HMM-based Speech Recognition System
to build a prototype which can be analyze to create a mobile
platform that can communicated with human by using voice Speech SDK5.1 is an HMM-based speech recognizer.
command in the future. HMM stands for Hidden Markov Models, which is a type of
statistical model. In HMM-based speech recognizers, each
Keywords- voice Recognition, Remote Navigated Mobile unit of sound (usually called a phoneme) is represented by a
Platform, Nonholonomic, Car, Real Time. statistical model that represents the distribution of all the
evidence (data) for that phoneme. This is called the acoustic
I. INTRODUCTION model for that phoneme. When creating an acoustic model,
the speech signals are first transformed into a sequence of
Since human usually communicate each other by voices, it vectors that represent certain characteristics of the signal, and
is very convenient if voice is used to command mobile robot. the parameters of the acoustic model are then estimated using
A lot of researches have been purpose. Some of them these vectors (usually called features). This process is called
proposed algorithm of voice recognition and others training the acoustic models[4]-[10].During speech
concentrated on the application of voice recognition rather recognition, features are derived from the incoming speech
than voice recognition itself. Some researches discussed the (we will use "speech" to mean the same thing as "audio") in
effectiveness of voice input compare with other input device the same way as in the training process. The component of
in robot systems [1-3]. the recognizer that generates these features is called the front
Program develop to compare the every speech that say in end. These live features are scored against the acoustic
the microphone to text then compare with a size for that word model. The score obtained indicates how likely that a
that came out. This is become easy by using Microsoft particular set of features (extracted from live audio) belongs
Foundation Class library speech. Application wizard has to the phoneme of the corresponding acoustic model. The
created this Speech application for us to reprogram and edit. process of speech recognition is to find the best possible
This application not only demonstrates the basics of using the sequence of words (or units) that will fit the given input
Microsoft Foundation classes but is also a starting point for speech. It is a search problem, and in the case of HMM-based
writing your application. There couple of sub file for speech recognizers, a graph search problem. The graph represents all
application then dialog class for put on view text. Followed possible sequences of phonemes in the entire language of the
by Comparison between list of order and dialog came out in
4B2-1
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
task under consideration. The graph is typically composed of highest score is the best fit, and a result taking all the words
the HMMs of sound units concatenated in a guided manner, of that path is returned.
as specified by the grammar of the task. As an example, let
look at a simple search graph that decodes the words "one" Basic idea Speech SDK5.1 Architecture and Main
and "two". It is composed of the HMMs of the sounds units Components
of the words "one" and "two":
As you can see from the above chart, a lot of the nodes have
self transitions. This can lead to a very large number of
possible paths through the chart. As a result, finding the best
possible path can take a very long time. The purpose of the
prune is to reduce the number of possible paths during the
search, using heuristics like pruning away the lowest scoring
paths.
4B2-2
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
4B2-3
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
because it only has a very little influence over the robot’s Mobile robot differential drive
kinematics In case of differential drive, to avoid slippage and Let’s try to find formulas by means of which is possible
have only a pure rolling motion, the robot must rotate around to compute the actual position of the robot.
a point that lies on the common axis of the two driving
wheels. This point is known as the instantaneous center of
curvature (ICC) or the instantaneous center of rotation (ICR)
.
v2 v1 dy V y(t ).d
VCR Equation 4 Where:
2
V x (t ) V (t ). cos[(t )] Equation 6
4B2-4
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
80
Table 3. Experiment speed and load for half and full step
60
100
80
60
40
20
0
0 5 10 15 20 25
RP M
Fig. 10. Front wheel cornering motion is used Rack and Pinion Joint.
From figure 10 we can see the front tire use rack and pinion
joint for control the front wheels angle when turn. Since the
pinion gear has 40 teeth the angle for cornering is limited.
4B2-5
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
V. CONCLUSION
4B2-6
Proceedings of the International Conference on Man-Machine Systems (ICoMMS)
11 – 13 October 2009, Batu Ferringhi, Penang, MALAYSIA
REFERENCES
[11] L. E. Baum, “An Inequality and Associated Miximization
Technique in Statistical Estimation for Probabilistic Functions of
[1] N, Yamasaki a d Y. Atlzai, “Active Interface for Human-
Markov Processes”, Proc. Symp. On Inequalities, vol. 3, pp 1-7,
[2] Robot Interaction,” Proceedings of the 2995 ZEEE Internutional
Academic Press, New York and London 1972.
Conference on Robotics and Automation, Nagoya, Japan, May,
[12] A. Chandrakasan and S. Sheng and R. Brodersen. Low Power
pp. 3103-3109, 1995.
CMOS Digital Design. JSSC, V27, N4, April 1992, pp 473{484.
[3] D. S. Lees and L. J. Leifer, “A Graphical Programming Language
[13] ATMEL. ATMEL ATC18 64K x 32-bit Low-power Flash. 2002.
for Robots Operating in Lightly Structured Environments,”
http://www.atmel.com.
Proceedings of the I993 IEEE International Conference on
[14] CMU Speech. List of Speech Recognition Hardware
Robotics and Automation, San Diego, California, May, pp.
Products.2002.http://www.speech.cs.cmu.edu/comp.speech/
648453,1993.
Section6/Q6.5.html.
[4] T. Nishimoto et. al., “proving Human Interface in Drawing Tool
[15] C. Carter and R. Graham, .Experimental Comparison of Manual
Using Speech, Mouse and Key-board,” Proceedings of the 4th
and Voice Controls for the Operation of In-Vehicle Systems,. In
IEEE International Workshop on Robot and Human
Proceedings of the IEA2000/HFES2000 Congress, Santa Monica,
Communication, Tokyo, Japan, July, pp. 107-112, 1993.
CA
[5] L. R. Rabiner, B. H. Juang, “An Introduction to Hidden Markov
[16] J.H.L. Hansen, .Analysis and Compensation of Speech under
Models”, IEEE ASSP Magazine, pp. 4 – 16, Jan. 1986.
Stress and Noise for Environmental Robustness in Speech
[6] T.B. Hughes, H.-S. Kim, J.H. DiBiase, and H.F.Silverman,
Recognition,. Speech Communications, Special Issue on Speech
.Performance of an HMM. speech recognizer using a realtime
Under Stress, vol. 20(2), pp. 151-170, November 1996.
tracking microphone array as input,. IEEE Trans. Speech and
[17] A. Baron and P. Green, .Safety and Usability of Speech Interfaces
Audio Process., vol. 7, no. 3, pp. 346.349, 1999.
for In-Vehicle Tasks while Driving: A Brief Literature Review,.
[7] L. R. Rabiner, “ A Tutorial on Hidden Markov Models and
Technical Report UMTRI-2006-5, Feb. 2006.
Selected Applications in Speech Recognition”, vol. 77, no. 2, pp.
[18] J. H.L. Hansen, J. Plucienkowski, S. Gallant, R. Gallant, B.
257-286, 1989.
Pellom, and W. Ward, .CU-Move: Robust speech processing for
[8] L. R. Rabiner, B. H. Juang, “Fundamentals of Speech
in-vehicle speech systems,. in ICSLP, pp. 524.527,2000.
Recognition”, Prentice Hall, Englewood Cliffs, New Jersey, 1993.
[9] L.E. Baum and T. Petrie, “ Statistical Inference for Probabilistic
Functions of Finite State Markov Chains”, Ann. Math. Stat., vol.
37 ,pp 1554-1563, 1966.
[10] L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “ A Maximization
Technique Occurring in the Statistical Analysis of Probabilistic
Functions of Markov Chains”, Ann. Math. Stat., vol. 41, no.1, pp
164-171, 1970.
4B2-7