Professional Documents
Culture Documents
display=Print
4.22 / 5, 57 votes
Multimedia » Audio and Video » Audio
Speech Recognition
By Tambi Ashmoz | 27 Feb 2004
Voice-activated OS
Introduction
This is part of a larger project on speech recognition we developed at ORT Braude college. The aim of
the project is to activate programs on your desktop or panel by voice.
Motivation
We planned to make some common tasks that every user does on his/her computer (opening/ closing
programs, editing texts, calculating) possible not only by mouse/ keyboard, but also by voice.
Background
Every speech recognition application consists of:
Needless to say that as the grammar increases, the probability of misinterpretations grows. We tried
to keep the grammar as small as possible without loosing information. The grammar format is
explained latter.
Requirements
We need SAPI5 (ships with XP)
Microsoft Engine for English (if not found can be downloaded from Microsoft's site)
The easiest way to check if you have these is to enter your control panel-> speech. Here you should
1 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
see the "Text to Speech" tab AND the "Speech recognition" tab. If you don't see the "Speech
Recognition" tab then you should download it from the Microsoft site.
How to Start
The project's interface is shown bellow (Fig 1).
In order to start talking right away, you should do these two steps...
1. The first (and important) thing to do is adjust the microphone by clicking the right mouse button
and choosing the "Mic training wizard."
2. The second (also important) thing to do is training the engine to your voice by choosing "User
training wizard."
IMPORTANT: after these changes, you will need to make the program start listening again by
clicking the right mouse button and choosing "Start listen." The more you train the engine, the better
it will recognize your voice, although you will see an improvement from the first training. After the
program is started, it may be in several "states". In every state, it recognizes a list of specific
commands. The list of the commands that the program can identify is shown below.
To enable/disable the mic (it's switched according to what you choose), after disabling the label's
becomes red (accuracy and state) indicating our state.
"Use agent"
Although the agent is used only for giving feedback, it could be useful to know if your command
is heard or not. This is so even though you can disable it if you want or if you don't have an
2 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
agent file (can be downloaded from Microsoft, ACS files) or if it is not working and you still want
to use the recognition (there is no connection between the agent and the recognition). This also
is being taken care of if the program didn't find the agent file or could not be loaded from any
other reason.
"Add favorites"
In the "activate" state you can say the command "favorites programs" and open a form with
your favorites programs and running them by saying the program name. This menu will open a
form showing your favorites programs so you can add/delete or edit them as you want.
"Change character"
This will allow you changing the agent character (can download them from Microsoft site, ACS
files).
Every recognition accuracy is displayed in the "Accuracy" label. You can choose this menu and
change the accuracy limit that you want the program to respond to the command that he hears
with. You should do this to avoid responding to any voice or sound that he hears. you can raise
this more every time that you train your computer and increase the recognition.
If the program is being used by several users, you can choose to give each user a profile and
train the computer for each one (to add a user profile enter "control panel -> speech." Here you
can only choose existing ones).
This is very important (as I explained before) for the recognition. The first thing to do in every
computer (only at the first time) is to activate this menu and setting up your mic or if you
changed your mic to a new one.
For a better recognition (notice that the training is for the selected user profile).
How it Works
The initial state is in the "deactivate" state, which means that the program is in a sleepy state... After
the command "activate" you will wake up the program ("activate" state) and start recognizes other
commands (Fig 2).
For example, use "start" to activate the start menu. Then you can say "programs" to enter the
programs menu. From this point, you can navigate by saying "down"," up", "right"... "OK"
according the commands list. You can also say "commands list" from any point to see a form with
the list of the commands that you can say.
3 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
One of the important states in the program is the "menu" state, meaning that if a program is running
(and focused) you can say "menu" to hook all menu items and start using them. For example, if you
are running Notepad you could open new file by saying "menu"->"File"->"new file". Every time
that you hook menu, you can see how many menus the program hooked so you can start using them
as commands. I had a little problem with some menus like "Word" and "Excel" that I couldn't hook,
but... I'll check it later.
Another nice state is "Numeric state". For example, say the commands "favorites
programs","calculator","enter numeric state", "one","plus","two","equal" and see the result.
Alternatively, you can open a site in "Alphabetic state". For example, say the commands "favorites
programs","internet explorer","enter alphabetic state", "menu","down","down","O K",
"enter alphabetic state","c","o","d","e",...,"dot","c","o","m" and see the result.
Getting Help
One of the main problems with the voice activated systems is what happens if you don't know exactly
which commands the computer expects. No problem! If you are unable to proceed just say "commands
list " and the program will show you what are the available commands from here. States (commands)
available in the program:
Code Explanation
4 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
The first thing to do is to add reference to the file... C:\Program Files\Common Files\Microsoft
Shared\Speech\SAPI.dll so we can use the Speech Library by writing...
using SpeechLib;
When we activate the engine, the initialization step takes place. There are mainly 3 objects involved:
1. An SpSharedRecoContext that starts the recognition process (must be shared so it can apply
to all processes). It implements an ISpeechRecoContext interface. After this object is created,
we add the events we are interested in (in our case AudioLevel and Recognition)
2. A static grammar object that can be loaded from XML file or programmatically implements
ISpeechRecoGrammar the list of static recognizable words is shown in Fig 2 and attached for
downloading dynamic grammar that lets adding rules implement ISpeechGrammarRule;. The
rule has two main parts:
The phrase associated
The name of the rule
grammar=objRecoContext.CreateGrammar(0);
}
catch(Exception ex)
{
MessageBox.Show("Exeption \n"+ex.ToString(),"Error - initSAPI");
}
}
After initialization, the engine still will not recognize anything until we load a grammar. There are two
ways to do that: loading a grammar from file...
Or we can change the grammar programmatically. The function is getting an ArrayList that every
5 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
item is a structure:
int i;
for (i=0;i< phraseList.Count;i++)
{
command1=(command)phraseList[i];
rule=grammar.Rules.Add(command1.ruleName,
SpeechRuleAttributes.SRATopLevel, i+100);
state=rule.InitialState;
propertyValue="";
state.AddWordTransition(null,command1.phrase," ",
SpeechGrammarWordType.SGLexical, "",
0, ref propertyValue, 1F);
//commit rules
grammar.Rules.Commit();
grammar.CmdSetRuleState(command1.ruleName,
SpeechRuleState.SGDSActive);
}
}
string phrase=e.PhraseInfo.GetText(0,-1,true);
.
.
.
}
Hooking Menus
When a program is activated, by saying "Menu" its menu is hooked and its commands added to the
dynamic grammar. We used some unmanaged functions which we imported from user32.dll. The
program also hooks the accelerators that are associated with each menu (that have an & sign before
them). The command is simulated with function keybd_event and executed.
6 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
initSAPI();
SAPIGrammarFromFile("XMLDeactivate.xml");
int mnuCnt=GetMenuItemCount(hMnu);
if (mnuCnt!=0)
{
//add menu to grammar
int i;
command command1;
GetMenuString(hMnu,i,mnuStr,50,-1);
if (mnuStr.ToString()!="")
{
//save in commnd1.ruleName only the underlined letter
command1.ruleName=mnuStr.ToString();
command1.ruleName=command1.ruleName[
command1.ruleName.IndexOf('&')+1].ToString();
command1.phrase=mnuStr.ToString();
command1.phrase=command1.phrase.Remove(
command1.phrase.IndexOf('&'),1);
phraseList.Add(command1);
}
}
SAPIGrammarFromArrayList(phraseList);
}
}
Grammar Format
Sample XML grammar... (for the complete grammar tags see Microsoft documentation)
7 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
Points of Interest
We used the MSAgent, but in our case it has a passive role (gives feedback that the command is
heard).
There exists an accuracy option. The user can establish a threshold so he can filter unsure
recognitions.
In the future, we plan to make more applications "voice friendly."
License
This article has no explicit license attached to it but may contain usage terms in the article text or the
download files themselves. If in doubt please contact the author via the discussion board below.
Tambi Ashmoz
Israel
Member
8 of 9 9/25/2010 4:07 AM
Speech Recognition - CodeProject http://www.codeproject.com/KB/audio-video/tambiSR.aspx?display=Print
9 of 9 9/25/2010 4:07 AM