Professional Documents
Culture Documents
Automaticspeechrecognition(ASR)canbedefinedastheindependent,computerdriventranscriptionofspoken
languageintoreadabletextinrealtime(Stuckless,1994).Inanutshell,ASRistechnologythatallowsacomputer
toidentifythewordsthatapersonspeaksintoamicrophoneortelephoneandconvertittowrittentext.
Havingamachinetounderstandfluentlyspokenspeechhasdrivenspeechresearchformorethan50years.
AlthoughASRtechnologyisnotyetatthepointwheremachinesunderstandallspeech,inanyacoustic
environment,orbyanyperson,itisusedonadaytodaybasisinanumberofapplicationsandservices.
TheultimategoalofASRresearchistoallowacomputertorecognizeinrealtime,with100%accuracy,allwords
thatareintelligiblyspokenbyanyperson,independentofvocabularysize,noise,speakercharacteristicsor
accent.Today,ifthesystemistrainedtolearnanindividualspeaker'svoice,thenmuchlargervocabulariesare
possibleandaccuracycanbegreaterthan90%.
CommerciallyavailableASRsystemsusuallyrequireonlyashortperiodofspeakertrainingandmaysuccessfully
capturecontinuousspeechwithalargevocabularyatnormalpacewithaveryhighaccuracy.Mostcommercial
companiesclaimthatrecognitionsoftwarecanachievebetween98%to99%accuracyifoperatedunderoptimal
conditions.`Optimalconditions'usuallyassumethatusers:havespeechcharacteristicswhichmatchthetraining
data,canachieveproperspeakeradaptation,andworkinacleannoiseenvironment(e.g.quietspace).
Thisexplainswhysomeusers,especiallythosewhosespeechisheavilyaccented,mightachieverecognitionrates
muchlowerthanexpected.
HistoryofASRTechnology
Theearliestattemptstodevisesystemsforautomaticspeechrecognitionbymachineweremadeinthe1950s.
Muchoftheearlyresearchleadingtothedevelopmentofspeechactivationandrecognitiontechnologywas
fundedbytheNationalScienceFoundation(NSF)andtheDefenseDepartment'sDefenseAdvancedResearch
ProjectsAgency(DARPA).Muchoftheinitialresearch,performedwithNSAandNSFfunding,wasconductedin
the1980s.(Source:GlobalSecurity.Org)
Speechrecognitiontechnologywasdesignedinitiallyforindividualsinthedisabilitycommunity.Forexample,
voicerecognitioncanhelppeoplewithmusculoskeletaldisabilitiescausedbymultiplesclerosis,cerebralpalsy,or
arthritisachievemaximumproductivityoncomputers.
Duringtheearly1990s,tremendousmarketopportunitiesemergedforspeechrecognitioncomputertechnology.
Theearlyversionsoftheseproductswereclunkyandhardtouse.Theearlylanguagerecognitionsystemshadto
makecompromises:theywere"tuned"tobedependentonaparticularspeaker,orhadsmallvocabulary,orused
averystylizedandrigidsyntax.However,inthecomputerindustry,nothingstaysthesameforverylongandby
theendofthe1990stherewasawholenewcropofcommercialspeechrecognitionsoftwarepackagesthatwere
easiertouseandmoreeffectivethantheirpredecessors.
Inrecentyears,speechrecognitiontechnologyhasadvancedtothepointwhereitisusedbymillionsof
individualstoautomaticallycreatedocumentsfromdictation.Medicaltranscriptionistslistentodictated
recordingsmadebyphysiciansandotherhealthcareprofessionalsandtranscribethemintomedicalreports,
correspondence,andotheradministrativematerial.Anincreasinglypopularmethodutilizesspeechrecognition
technology,whichelectronicallytranslatessoundintotextandcreatestranscriptsanddraftsofreports.
Transcriptsandreportsarethenformatted;editedformistakesintranslation,punctuation,orgrammar;and
checkedforconsistencyandanypossibleerrors.Transcriptionistsworkinginareaswithstandardizedterminology,
suchasradiologyorpathology,aremorelikelytoencounterspeechrecognitiontechnology.Useofspeech
recognitiontechnologywillbecomemorewidespreadasthetechnologybecomesmoresophisticated.
Somevoicewritersproduceatranscriptinrealtime,usingcomputerspeechrecognitiontechnology.Speech
recognitionenabledvoicewriterspursuenotonlycourtreportingcareers,butalsocareersasclosedcaptioners
andInternetstreamingtextprovidersorcaptionproviders.
HowDoesASRWork?
ThegoalofanASRsystemistoaccuratelyandefficientlyconvertaspeechsignalintoatextmessagetranscription
ofthespokenwordsindependentofthespeaker,environmentorthedeviceusedtorecordthespeech(i.e.the
microphone).
Thisprocessbeginswhenaspeakerdecideswhattosayandactuallyspeaksasentence.(Thisisasequenceof
wordspossiblywithpauses,uhs,andums.)Thesoftwarethenproducesaspeechwaveform,whichembodies
thewordsofthesentenceaswellastheextraneoussoundsandpausesinthespokeninput.Next,thesoftware
attemptstodecodethespeechintothebestestimateofthesentence.Firstitconvertsthespeechsignalintoa
sequenceofvectorswhicharemeasuredthroughoutthedurationofthespeechsignal.Then,usingasyntactic
decoderitgeneratesavalidsequenceofrepresentations.(Rabiner&Juang,2004)
WhatistheBenefitofASR?
Therearefundamentallythreemajorreasonswhysomuchresearchandefforthasgoneintotheproblemof
tryingtoteachmachinestorecognizeandunderstandspeech:
Accessibilityforthedeafandhardofhearing
Costreductionthroughautomation
Searchabletextcapability
What'sbeenhappeninginASR?
AsidefromthescientistsandtechnicianswhoareengagedinASRresearchanddevelopment,mostpeoplewho
thinkaboutASRunderestimateitscomplexity.Itismorethanautomatictexttospeech,ASRrequiresfast
computerswithlotsofdatacapacityandmemoryanecessaryconditionforcomplexrecognitiontasks,andthe
involvementofspeechscientists,linguists,computerscientists,mathematicians,andengineers.
ThesearchisonforASRsystemsthatincorporatethreefeatures:largevocabularies,continuousspeech
capabilities,andspeakerindependence.Today,therearenumeroussystemswhichincorporatethese
combinations.
What'sahead?
Encouragedbysomeinnovativemodels,developmentsinASRappeartobeaccelerating.Theoutlookisoptimistic
thatfutureapplicationsofautomaticspeechrecognitionwillcontributesubstantiallytothequalityoflifeamong
deafchildrenandadults,andotherswhosharetheirlives,aswellaspublicandprivatesectorsofthebusiness
communitywhowillbenefitfromthistechnology.
References
Stuckless, R. (1983). Real-time transliteration of speech into print for hearing impaired students in regular classes. American
Annals of the Deaf, 128, 619-624.
Stuckless, R. (1994). Developments in real-time speech-to-text communication for people with impaired hearing. In M.
Ross(Ed.), Communication access for people with hearing loss (pp.197-226). Baltimore, MD: York Press.
Rabiner, Lawrence R. and Juang, B.H. (2004). Statistical Methods for the Recognition and Understanding of Speech.
Rutgers University and the University of California, Santa Barbara; Georgia Institute of Technology, Atlanta
June2009,Docsoft,Inc.Thiswhitepaperisforinformationalpurposesonly.Docsoftmakesnowarranties,expressorimpliedin
thissummary.TheinformationcontainedinthisdocumentrepresentsthecurrentviewofDocsoftInc.ontheitemsdiscussedasof
thedateofthispublication.
DocsoftInc.hasbeenpromotingtheuseofAutomaticSpeechRecognitionTechnologythroughthemarketingofour
productssince2002.Visitusonlineatwww.docsoft.com.Ifyouarelookingformoreinformationabouttheusabilityof
Docsoft:AVandDocsoft:AVServices,pleasecontactus.Wecanbereachedtollfreeat18774303502orat
info@docsoft.com.