Professional Documents
Culture Documents
Intelligence
Group members: -
Prince (BFT/18/513)
Somya (BFT/18/526)
INTRODUCTION
HISTORY
The ‘80s saw speech recognition vocabulary go from a few hundred words
to several thousand words.
In 1978, the Speak & Spell, using a speech chip, was introduced to help
children spell out words. The speech chip within would prove to be an
important tool for the next phase in speech recognition software. In 1987,
the World of Wonders “Julie” doll came out. In an impressive (if not
downright terrifying) display, Julie was able to respond to a speaker and
had the capacity to distinguish between speaker’s voices.
Three short years after Julie, the world was introduced to Dragon,
debuting its first speech recognition system, the “Dragon Dictate”. Around
the same time, AT&T was playing with over-the-phone speech recognition
software to help field their customer service calls. In 1997, Dragon
released “Naturally Speaking,” which allowed for natural speech to be
processed without the need for pauses. What started out as a painfully
simple and often inaccurate system is now easy for customers to use.
One year later in 2011, Apple debuted ‘Siri’. ‘She’ became instantly
famous for her incredible ability to accurately process natural utterances.
And, for her ability to respond using conversational – and often shockingly
sassy – language. You’re sure to have seen a few screen-captures of her
pre-programmed humour floating around the internet. Her success,
boosted by zealous Apple fans, brought speech recognition technology to
the forefront of innovation and technology. With the ability to respond
using natural language and to ‘learn’ using cloud-based processing, Siri
catalysed the birth of other likeminded technologies such as Amazon’s
Alexa and Microsoft’s Cortana.
Following are a few of the basic terms and concepts that are fundamental
to speech recognition.
Utterances
Silence
Pronunciations
The speech recognition engine uses all sorts of data, statistical models,
and algorithms to convert spoken input into text. One piece of information
that the speech recognition engine uses to process a word is its
pronunciation, which represents what the speech engine thinks a word
should sound like.
Grammars
TYPES
Isolated Speech
Connected Speech
Continuous speech
Spontaneous Speech
VOICE VERIFICATION/IDENTIFICATION
Technical Feasibility
There are many of components that can build our system: (hardware,
software, and human components).
Hardware components
Computer components
RAM 2 GB 4 GB
Human Components
Software Components
Visual studio 2015: for build up our project, creates all the window forms
application and designing an interfaces.
MySQL: for managing the database (creates tables, store the data).
C# is the open source language and run on Windows, Mac, and Linux.
This language helps you for developing the windows store application,
Android apps, and iOS apps. It can also be useful to build backend and
middle-tire framework and libraries. It supports language interoperability it
means that C# can access code written in any .NET compliant language.
Text pre-processing: analyze the input text for special constructs of the
language. In English, special treatment is required for abbreviations,
acronyms, dates, times, numbers, currency amounts, email addresses
and many other forms. Other languages need special processing for these
forms and most languages have other specialized requirements.
PROCESS
The basic principle of voice recognition involves the fact that speech or
words spoken by any human being cause vibrations in air, known as
sound waves. These continuous or analog waves are digitized and
processed and then decoded to appropriate words and then appropriate
sentences.
Next the signal is divided into small segments. The program then matches
these segments to known phonemes of a language. A phoneme is the
smallest element of a language -- a representation of the sounds we make
and put together to form meaningful expressions.
Access – For writers with physical disabilities that prevent them from using
a keyboard and mouse, being able to issue voice commands and dictate
words into a text document is a significant advantage.
Spelling – you will have access to the same editing tools as a standard
word processing solution. Of course, nothing is 100 percent accurate
(yet), but the software will catch the majority of spelling and grammatical
errors.
Speed – the software can capture your speech at a faster rate than you
might normally type. So, it is now possible to get your thoughts onto
electronic paper faster than waiting for your fingers to catch up.
CONS
Set-up and Training can be a significant investment of time. Despite promises
that you’ll be up and running in a few minutes after installation, the reality of
recording your voice commands is more complex. Capturing your tone and
inflection accurately sometimes takes time. Even the software takes a pause at
few sentences, as it tries to figure out what you said. Therefore, it all requires
patience and clear enunciation.
Frequent Pauses can at times spoil your mood. Remember that the goal was to
write faster than you could normally type. Changes in voice tone or speech
clarity can cause glitches, as an unrecognized words or acronyms.
Limited Vocabulary – you should also be ready for lots of delays while the
software stumbles on your strange words. The simple reason for this is, new
industry-specific vocabularies are being added all the time these days.
Apple’s Siri
Apple’s Siri was the first voice assistant created by mainstream tech
companies debuting back in 2011.
Since then, it has been integrated on all iPhones, iPads, the AppleWatch,
the HomePod, Mac computers, and Apple TV.
Via your phone, Siri is even being used as the key user interface in Apple’s
CarPlay infotainment system for automobiles as well as the wireless
AirPod earbuds.
Although Apple had a big head start with Siri, many users expressed
frustration at its seeming inability to properly understand and interpret
voice commands.
If you ask Siri to send a text message or make a call on your behalf, it can
easily do so. However, when it comes to interacting with third-party apps,
Siri is a little less robust compared to its competitors, working with only six
types of apps: ride-hailing and sharing; messaging and calling; photo
search; payments; fitness; and auto infotainment systems.
Now, an iPhone user can say, “Hey Siri, I’d like a ride to the airport” or
“Hey Siri, order me a car,” and Siri will open whatever ride service app
you have on your phone and book the trip.
Instead, wagering that the voice assistant with the most “skills,” (its term
for apps on its Echo assistant devices), “will gain a loyal following, even if
it sometimes makes mistakes and takes more effort to use”.
Although some users have pegged Alexa’s word recognition rate as being
a shade behind other voice platforms, the good news is that Alexa adapts
to your voice over time, offsetting any issues it may have with your
particular accent or dialect.
Speaking of skills, Amazon’s Alexa Skills Kit (ASK) is perhaps what has
propelled Alexa forward as a bonafide platform. ASK allows third-party
developers to create apps and tap into the power of Alexa without ever
needing native support.
With over 30,000 skills and growing, Alexa certainly outperforms Siri,
Google Voice and Cortana combined in terms of third-party integration.
With the incentive to “Add Voice to Your Big Idea and Reach More
Customers” (not to mention the ability to build for free in the cloud “no
coding knowledge required”) it’s no wonder that developers are rushing to
put content on the Skills platform.
Another huge selling point for Alexa is its integration with smart home
devices such as cameras, door locks, entertainment systems, lighting and
thermostats.
If you ask Alexa to re-order your rubbish bags, she’ll just go through
Amazon and order them. In fact, you can order millions of products off of
Amazon without ever lifting a finger; a natural and unique ability that Alexa
has over its competitors.
Microsoft’s Cortana
In this race, every inch counts; when Microsoft announced their 5.9%
accuracy rate in late 2016, they were ahead of Google. However, fast-
forwarding a year puts Google ahead – but only by 0.2%.
We’ve all watched 2001: A Space Odyssey where the mother of all
sentient computers, HAL 9000, goes on a killing rampage with its
unblinking red eye and smooth-as-butter robotic voice.
For instance, if you aren’t comfortable with Cortana having access to your
email, your Notebook is where you can add or remove access. Another
stand-out feature? Cortana will always ask you before she stores any
information she finds in her Notebook.
And, similarly to Amazon, Microsoft has come out with its own home smart
speaker, Invoke, which executes many of the same functions that their
rival devices do. Microsoft has another huge advantage when it comes to
market reach – with Cortana being available on all Windows computers
and mobiles running on Windows 10.
Google Assistant
One of the most common responses to voicing a question out loud these
days is, “LMGTFY”. In other words, “let me Google that for you”.
It only makes sense then, that Google Assistant prevails when it comes to
answering (and understanding) any and all questions its users may have.
As of late 2017, Google boasted a 95% word accuracy rate for U.S.
English; the highest out of all the voice-assistants currently out there. This
translates to a 4.9% word error rate – making Google the first of the group
to fall below the 5% threshold.
More recently, Google also announced some new, key partnerships with
companies including Lenovo, LG and Sony to launch a line of Google
Assistant-powered “smart displays,” which once again seems to ‘echo’ the
likeness of Amazon’s Echo Show.
In-Car Speech Recognition
Companies like Apple, Google and Nuance are completely reshaping the
driver’s experience in their vehicle; aiming at removing the distraction of
looking down at your mobile phone while you drive allows drivers to keep
their eyes on the road.
Instead of texting while driving, you can now tell your car who to call or
what restaurant to navigate to.
Instead of scrolling through Apple Music to find your favorite playlist, you
can just ask Siri to find and play it for you.
If the fuel in your car is running low, you’re in-car speech system can not
only inform you that you need to refuel, but also point out the nearest fuel
station and ask whether you have a preference for a particular brand.
Or perhaps it can warn you that the petrol station you prefer is too far to
reach with the fuel remaining.
It takes years to properly flesh out the plot, the gameplay, character
development, customizable gear, lottery systems, worlds, and so on. Not
only that, but the game has to be able to change and adapt based on each
player’s actions.
Voice control could also potentially lower the learning curve for beginners,
seeing as less importance will be placed on figuring out controls; players
can “just” begin talking right away.
In other words: it’ll be extremely challenging for game developers who will
now have to account for hundreds (if not thousands) of hours of voice data
collection, speech technology integration, testing and coding in order to
retain their international audience.
However, despite all the goals tech companies are shooting for and the
challenges they have to overcome along the way, there are already
handfuls of video games out there who have believed the benefits
outweigh the obstacles.
In fact, voice-activated video games have even begun to extend from the
classic console and PC format to voice-activated mobile games and apps.
From Seaman starring a sarcastic man-fish brought to life by Leonard
Nimoy’s voice in the late 1990s to Microsoft’s Mass Effect 3 released in
2012, the rise of speech technology in video games has only just begun.
References: -
https://www.globalme.net/blog/speech-recognition-software-history-
future/
https://www.globalme.net/blog/the-present-future-of-speech-recognition/
https://www.tldp.org/HOWTO/Speech-Recognition-
HOWTO/introduction.html
http://read.pudn.com/downloads167/doc/769783/Introduction_to_Speec
h_Recognition.pdf