You are on page 1of 3

WhatisAutomaticSpeechRecognition?

Automaticspeechrecognition(ASR)canbedefinedastheindependent,computerdriventranscriptionofspoken
languageintoreadabletextinrealtime(Stuckless,1994).Inanutshell,ASRistechnologythatallowsacomputer
toidentifythewordsthatapersonspeaksintoamicrophoneortelephoneandconvertittowrittentext.
Havingamachinetounderstandfluentlyspokenspeechhasdrivenspeechresearchformorethan50years.
AlthoughASRtechnologyisnotyetatthepointwheremachinesunderstandallspeech,inanyacoustic
environment,orbyanyperson,itisusedonadaytodaybasisinanumberofapplicationsandservices.
TheultimategoalofASRresearchistoallowacomputertorecognizeinrealtime,with100%accuracy,allwords
thatareintelligiblyspokenbyanyperson,independentofvocabularysize,noise,speakercharacteristicsor
accent.Today,ifthesystemistrainedtolearnanindividualspeaker'svoice,thenmuchlargervocabulariesare
possibleandaccuracycanbegreaterthan90%.
CommerciallyavailableASRsystemsusuallyrequireonlyashortperiodofspeakertrainingandmaysuccessfully
capturecontinuousspeechwithalargevocabularyatnormalpacewithaveryhighaccuracy.Mostcommercial
companiesclaimthatrecognitionsoftwarecanachievebetween98%to99%accuracyifoperatedunderoptimal
conditions.`Optimalconditions'usuallyassumethatusers:havespeechcharacteristicswhichmatchthetraining
data,canachieveproperspeakeradaptation,andworkinacleannoiseenvironment(e.g.quietspace).
Thisexplainswhysomeusers,especiallythosewhosespeechisheavilyaccented,mightachieverecognitionrates
muchlowerthanexpected.

HistoryofASRTechnology

Theearliestattemptstodevisesystemsforautomaticspeechrecognitionbymachineweremadeinthe1950s.
Muchoftheearlyresearchleadingtothedevelopmentofspeechactivationandrecognitiontechnologywas
fundedbytheNationalScienceFoundation(NSF)andtheDefenseDepartment'sDefenseAdvancedResearch
ProjectsAgency(DARPA).Muchoftheinitialresearch,performedwithNSAandNSFfunding,wasconductedin
the1980s.(Source:GlobalSecurity.Org)
Speechrecognitiontechnologywasdesignedinitiallyforindividualsinthedisabilitycommunity.Forexample,
voicerecognitioncanhelppeoplewithmusculoskeletaldisabilitiescausedbymultiplesclerosis,cerebralpalsy,or
arthritisachievemaximumproductivityoncomputers.
Duringtheearly1990s,tremendousmarketopportunitiesemergedforspeechrecognitioncomputertechnology.
Theearlyversionsoftheseproductswereclunkyandhardtouse.Theearlylanguagerecognitionsystemshadto
makecompromises:theywere"tuned"tobedependentonaparticularspeaker,orhadsmallvocabulary,orused
averystylizedandrigidsyntax.However,inthecomputerindustry,nothingstaysthesameforverylongandby
theendofthe1990stherewasawholenewcropofcommercialspeechrecognitionsoftwarepackagesthatwere
easiertouseandmoreeffectivethantheirpredecessors.

Inrecentyears,speechrecognitiontechnologyhasadvancedtothepointwhereitisusedbymillionsof
individualstoautomaticallycreatedocumentsfromdictation.Medicaltranscriptionistslistentodictated
recordingsmadebyphysiciansandotherhealthcareprofessionalsandtranscribethemintomedicalreports,
correspondence,andotheradministrativematerial.Anincreasinglypopularmethodutilizesspeechrecognition
technology,whichelectronicallytranslatessoundintotextandcreatestranscriptsanddraftsofreports.
Transcriptsandreportsarethenformatted;editedformistakesintranslation,punctuation,orgrammar;and
checkedforconsistencyandanypossibleerrors.Transcriptionistsworkinginareaswithstandardizedterminology,
suchasradiologyorpathology,aremorelikelytoencounterspeechrecognitiontechnology.Useofspeech
recognitiontechnologywillbecomemorewidespreadasthetechnologybecomesmoresophisticated.
Somevoicewritersproduceatranscriptinrealtime,usingcomputerspeechrecognitiontechnology.Speech
recognitionenabledvoicewriterspursuenotonlycourtreportingcareers,butalsocareersasclosedcaptioners
andInternetstreamingtextprovidersorcaptionproviders.

HowDoesASRWork?

ThegoalofanASRsystemistoaccuratelyandefficientlyconvertaspeechsignalintoatextmessagetranscription
ofthespokenwordsindependentofthespeaker,environmentorthedeviceusedtorecordthespeech(i.e.the
microphone).
Thisprocessbeginswhenaspeakerdecideswhattosayandactuallyspeaksasentence.(Thisisasequenceof
wordspossiblywithpauses,uhs,andums.)Thesoftwarethenproducesaspeechwaveform,whichembodies
thewordsofthesentenceaswellastheextraneoussoundsandpausesinthespokeninput.Next,thesoftware
attemptstodecodethespeechintothebestestimateofthesentence.Firstitconvertsthespeechsignalintoa
sequenceofvectorswhicharemeasuredthroughoutthedurationofthespeechsignal.Then,usingasyntactic
decoderitgeneratesavalidsequenceofrepresentations.(Rabiner&Juang,2004)

WhatistheBenefitofASR?

Therearefundamentallythreemajorreasonswhysomuchresearchandefforthasgoneintotheproblemof
tryingtoteachmachinestorecognizeandunderstandspeech:

Accessibilityforthedeafandhardofhearing

Costreductionthroughautomation

Searchabletextcapability

What'sbeenhappeninginASR?

AsidefromthescientistsandtechnicianswhoareengagedinASRresearchanddevelopment,mostpeoplewho
thinkaboutASRunderestimateitscomplexity.Itismorethanautomatictexttospeech,ASRrequiresfast

computerswithlotsofdatacapacityandmemoryanecessaryconditionforcomplexrecognitiontasks,andthe
involvementofspeechscientists,linguists,computerscientists,mathematicians,andengineers.
ThesearchisonforASRsystemsthatincorporatethreefeatures:largevocabularies,continuousspeech
capabilities,andspeakerindependence.Today,therearenumeroussystemswhichincorporatethese
combinations.

What'sahead?

Encouragedbysomeinnovativemodels,developmentsinASRappeartobeaccelerating.Theoutlookisoptimistic
thatfutureapplicationsofautomaticspeechrecognitionwillcontributesubstantiallytothequalityoflifeamong
deafchildrenandadults,andotherswhosharetheirlives,aswellaspublicandprivatesectorsofthebusiness
communitywhowillbenefitfromthistechnology.

References
Stuckless, R. (1983). Real-time transliteration of speech into print for hearing impaired students in regular classes. American
Annals of the Deaf, 128, 619-624.

Stuckless, R. (1994). Developments in real-time speech-to-text communication for people with impaired hearing. In M.
Ross(Ed.), Communication access for people with hearing loss (pp.197-226). Baltimore, MD: York Press.

Rabiner, Lawrence R. and Juang, B.H. (2004). Statistical Methods for the Recognition and Understanding of Speech.
Rutgers University and the University of California, Santa Barbara; Georgia Institute of Technology, Atlanta

June2009,Docsoft,Inc.Thiswhitepaperisforinformationalpurposesonly.Docsoftmakesnowarranties,expressorimpliedin
thissummary.TheinformationcontainedinthisdocumentrepresentsthecurrentviewofDocsoftInc.ontheitemsdiscussedasof
thedateofthispublication.

DocsoftInc.hasbeenpromotingtheuseofAutomaticSpeechRecognitionTechnologythroughthemarketingofour
productssince2002.Visitusonlineatwww.docsoft.com.Ifyouarelookingformoreinformationabouttheusabilityof
Docsoft:AVandDocsoft:AVServices,pleasecontactus.Wecanbereachedtollfreeat18774303502orat
info@docsoft.com.

You might also like