Machine Understanding OF Indian Spoken Languages: B Tech Project

Machine Understanding of Indian Spoken Languages
B Tech Project
MACHINE UNDERSTANDING OF INDIAN SPOKEN LANGUAGES
By Abhishek Agarwal 200101229
Dhirubhai Ambani Institute of Information & Communication Technology Gan hinagar! G"#A$AT
A%ril 200&
'1'
Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar
CERTIFICATE
This is to certify that the Project Report titled Machine Understanding of Indian Spoken Languages submitted by Abhishek Agarwal ID 200101229 for the partial fulfillment of the re!uirements of B Tech "ICT# degree of the institute embodies the $or% done by him &ff campus under my super'ision( Date: _____________ Signature: _________________ (Dr. Sandeep Sibal) Date: _____________ Signature: _________________ (Prof. Prabhat Ranjan)
'2'
ACKNOWLEDGEMENTS
This %ro(ect in)ol)e the collection an analysis of information from a wi e )ariety of sources an the efforts of many %eo%le beyon me* Thus it woul not ha)e been %ossible to achie)e the results re%orte in this ocument without their hel%! su%%ort an encouragement* I will like to e+%ress my gratitu e to the following %eo%le for their hel% in the work lea ing to this re%ort, Dr. Sandeep Sibal, Dr. Prabhat Ran an ! Dr. "itendra # $era - .ro(ect su%er)isors, for their useful comments on the sub(ect matter an for the knowle ge I gaine by sharing i eas with them* Prof. Minal Bhise- .ro(ect Coor inator, for organi/ing an coor inating the 0Tech .ro(ects1 2001*
'2'
ABSTRACT
3anguage I entification is %rocess of i entifying the language being s%oken from a sam%le of s%eech by an unknown s%eaker* 4ost of the %re)ious work in this fiel is base on the fact that %honeme se5uences ha)e ifferent occurrence %robabilities in ifferent languages! an all the systems esigne till now ha)e trie to e+%loit this fact* 3anguage i entification %rocess in turn consists of two sub'systems* 6irst system con)erts s%eech into some interme iate form calle as %honeme se5uences! which are use to mo el the language by oing their %robabilistic analysis in the secon sub' system* In this %ro(ect both of the sub'systems are targete * 6irst some algorithms are iscusse for esigning language mo els* Then an attem%t is ma e to esign an algorithm for e+tracting %honeme se5uences in form of more abstract classes eri)e by statistical tools like Gaussian 4i+ture 4o els 7G448 an 9i en 4arko) 4o el 79448*
':'
TABLE OF CONTENTS
M#%&I'( U'D(RS)#'DI'*.....................................................................................................+ ,-...................................................................................................................................................+ I'DI#' SP,.(' L#'*U#*(S...................................................................................................+ 0;********************************************************************************************************************************************************1 C)RTI*ICAT)((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+ AC,-&./)DG)0)-T1(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((2 AB1TRACT((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((3 TAB/) &* C&-T)-T1(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((4 5( I-TR&D6CTI&-(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((7 +( BAC,GR&6-D .&R,((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((8 2*1* DI<TI=CT C9A$ACT>$I<TIC< ?6 3A=G"AG> *******************************************************************************@ 2*2* ?A>$AI>B ?= 3A=G"AG> ID>=TI6I>$<****************************************************************************************@ /./.+. -ront0(nd Processing............................................................................................................1 /././. Phone$e Recogni2er.............................................................................................................1 /./.3. Language Models................................................................................................................+4 2*2* >CA4.3>, .9?=>TIC $>C?G=ITI?=D3A=G"AG> 4?D>3I=G******************************************************10 2( &B9)CTI:)1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((55 3( DI1C611I&- ; &- /A-G6AG) 0&D)/1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((55 :*1* 3A=G"AG> 4?D>3 E T$AI=I=G .9A<>***************************************************************************************12 :*2* 3A=G"AG> 4?D>3 E T"=I=G .9A<>******************************************************************************************1: 5./.+. Penalty for 2ero probability phone$es ..............................................................................+5 5././. 6eights for Unigra$, Bigra$, and )rigra$ probabilities.................................................+7 5./.3. Utterance Duration..............................................................................................................+8 :*2* 3A=G"AG> 4?D>3 E T><TI=G .9A<>*****************************************************************************************1@ 4( DI1C611I&- ; &T<)R APPR&AC<)1 (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((58 &*1* $>C?G=ITI?= 0A<>D ?= DI<TI=CT .9?=>4><****************************************************************************1@ 7.+.+. *raphs for Unigra$ and Bigra$ probabilities...................................................................+1 7.+./. ,bser9ations......................................................................................................................../4 7.+.3. %onclusion ........................................................................................................................../4 &*2* "<I=G GA"<<IA= 4ICT"$> 4?D>3< 6?$ 6$?=T >=D .$?C><<I=G*******************************************20 7( DI1C611I&- ; T&&/1 BA1)D &- /A-G6AG) ID((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+5 F*1* C?=T>=T A>$I6ICATI?= <;<T>4 7CA<8************************************************************************************21 F*2* CA33 $?"TI=G 6?$ C"<T?4>$ CA$> C>=T>$<***************************************************************************22 =( C&-C/61I&-((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+2 8( R)*)R)-C)1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+3
'&'
1. Introduction
The %roblem of 3anguage I entification 7language ID8 is efine as recogni/ing the language being s%oken from a sam%le of s%eech by an unknown s%eaker G2H* The human is by far the best language ID system in o%eration to ay! with accuracy as high as hun re %ercent in case if they know the language an can make a %retty reasonable guess about them in case if they on1t* This %ro(ect has trie to e)elo% this ability in machines* <e)eral im%ortant a%%lications alrea y e+ist for language ID* A language ID system coul be use as a Ifront'en I system to a tele%hone'base com%any! routing the caller to an a%%ro%riate o%erator fluent in the callerIs language* Currently either a manual system or IA$< base system e+ists* 0ut! both of them suffer from two main %roblems! howe)er, s%ee an e+%ense* It is highly e+%ensi)e to em%loy the call routers for AT&T who! between them! must be able to correctly route 1:0 languages or $eliance Infocomm for more than 10 languages* 6or emergency ser)ices! this coul be a fatal elay* ?ther a%%lication inclu es usage of such systems in war times when sol iers are oing rescue o%erations in alien lan s! to communicate with local %erson* Another a%%lication which actually has been im%lemente uring this %ro(ect inclu es its usage in esigning Content Aerification <ystem 7CA<8! which is use for )erification of the s%eech ata store for ifferent languages* As research in automatic s%eech recognition %rogresses! a language ID system woul be necessary for any multi'lingual s%eech recognition system* ?ne such system may be a fast information system! say at an air%ort! catering for multi'national clients* Another may be an automatic translation system* 0oth these systems woul nee to first recogni/e the language that was being s%oken before they coul %rocess it* There are number of ways to achie)e this task of language ID! like base on s%ectral features of s%eech! or base on wor le+icon or i entifying %resence of some istinct characteristics in ifferent languages like s%ecial %honemes* ?nes iscusse here are base on %honetic characteristics* To esign a language ID we shoul ha)e a %ro%er knowle ge about s%eech an its com%onents* <%eech is basically consists of small units of soun calle as %honemes* 6or e+am%le if you s%eak wor 0AT! then Jb! JK! Jt are three %honemes which together forms this soun * =ow for language i entification using %honetic characteristics! first a s%eech is con)erte into %honeme se5uences! which can be one using )arious metho s* In this %ro(ect the %honeme recogni/er use ! is base on 9i en 4arko) 4o els 79448* ?nce s%eech is con)erte into %honeme se5uences then a %robabilistic analysis is one which in turn is i)i e on three %hases- training! tuning & testing! an for that the cor%us is also i)i e accor ingly* In secon %hase of %ro(ect I ha)e actually attem%te to su%erse e the con)entional 944 way of con)erting s%eech into %honeme se5uences by using more abstract classes eri)e using statistical tools like Gaussian 4i+ture 4etho 7G448 an 944 an then %assing it through the same language mo els! which were esigne in first %hase*
'F'
'L'
Machine Understanding of Indian Spoken Languages All e+%eriments were con ucte on two s%eech ata set- M$a io broa cast ata1 an MCustomer care s%eech ata1! in two languages 79in i & >nglish8* >ach ata set was in turn i)i e into three is(oint set! so that a %ro%er testing of language mo els can be one an so that there is no biasing in the results* Though results from the later ata set are of more significance because of its a%%lication but %erformance was better in case of former* A brief iscussion on the research one in this fiel along with some e+am%le is gi)en in section 2* In following section : & &! work one by me along with the algorithm %ro%ose is iscusse * After that in section F! I ha)e iscusse about the tools which ha)e been esigne by me base on the algorithm %ro%ose ! followe by conclusion an references in subse5uent sections*
. B!c"#round Wor"
This section re)iews the current metho s use for language ID an research in the area of feature )ectors an language mo els* iscusses %re)ious
2.1. Distinct Characteristics of Language

N>ach language has a finite set of %honemes* As we learn our first language! we also learn to i entify them* Bhen listening to a foreign language! with %honemes not foun in our first language! the %resence of such soun s is rea ily a%%arent to us* >+am%les are the OclicksO foun in some sub'<aharan African languages* As the )ocal a%%aratus use in the %ro uction of languages is uni)ersal! there is much o)erla% of the %honeme sets! an the total number of %honemes is finite* 0ut there can be ifferences in the way the same %honeme is inter%rete in two ifferent languages* 6or e+am%le! in >nglish! DlD an DrD 7as in OleafO an OreefO8 are two ifferent %honemes! whereas in #a%anese they are notP*G12H ?n the contrary! the fre5uency of occurrence of %hones an the %honotactic rules in languages can iffer significantly* .honotactic rules go)ern the way ifferent %honemes are combine * 6or e+am%le! %honeme clusters DsrD an Ds%D are 5uite common in Tamil an German res%ecti)ely 7the latter coul be re%resente as Dsh%D in >nglish8! but are rare in >nglish* This is what we ha)e trie to e+%loit an use to esign an algorithm for language i entification! in this %ro(ect*
2.2. Overview on Language Identifiers

3anguage IDs works as a single entity in many a%%lications! but it is! in itself a set of three black bo+es- front0end processing syste$, phone$e recogni2er, and language $odels* <%eech Data is gi)en as an in%ut to these set of bo+es an then it flows into the system as shown in the figure* Im%lementation of e)ery system is hi en from others- only interfaces are stan ar i/e as we o in case of ?<I 3ayers of networking* 0y stan ar i/ation! we mean the format of ata! which will be %asse from one system to another! is fi+e *
'@'
. .1. Front$End Proc%&&in#

4ain %ur%ose of front'en %rocessing is the feature )ector e+traction* 4any ifferent algorithms e+ist for s%eech recognition an language i entification* A common nee between them is some form of %arameteri/e re%resentation 7feature )ectors8 of the s%eech in%ut* These feature )ector streams may then be use to train or interrogate the language mo els which will follow the feature e+traction mo ule in a ty%ical language i entification system GFH* It is ob)ious that there e+ist an infinite number of ways to enco e the s%eech! e%en ing u%on which %articular numerical measures are eeme useful* ?)er the many years of s%eech recognition research! there has been a con)ergence towar s a few 7s%ectrally base 8 features that %erform well* ?f these! 3inear .re iction 73.8 an Ce%stral measures are most wi ely use G&H* The final test for any such front'en is its effect on the accuracy of the o)erall language ID system* In this res%ect the system base on Ce%stral com%ares fa)orably with any other we ha)e come across in our in)estigation*
. . . P'on%(% R%co#ni)%r
The basic aim behin this system is to generate the %honeme se5uences from the )ector se5uences* There are &F %honemes! an their ifferent combinations can re%resent all %ossible s%eeches in )arious languages* Be use 9i en 4arko) 4o els 7944s8 for this %ur%ose*
944 mo els are %rimarily %robabilistic state machines! in which each state re%resents a %honeme* =ow there are two kin s of %robabilities attache with each state* 6irst! is the %robability with which we can say which will be the ne+t state 70 ++! as shown in fig*8 an secon ! is the %robability with which we can say what will be the out%ut soun when this state is reache ! which are re%resente by A +7?y8 in the '9'
Machine Understanding of Indian Spoken Languages figure* 0asically 944s are use for three %roblems! out of which one which we will be using it for is to fin out the most %robable state se5uence gi)en a se5uence of out%ut soun G1H* <o basically what it oes is that when %rocesse s%eech )ectors are %asse through this system it gi)es se5uence of %honemes* "sually you ha)e %honeme recogni/er s%ecific to a language! e%en ing on the training ata use * Be will iscuss about this in later sections*
. .*.L!n#u!#% Mod%+&
These are the most im%ortant as%ect of a language ID* There basic aim is to %re ict the language gi)en a %honeme se5uence! an for this %ur%ose some kin of %robabilistic analysis is one which is im%lementation e%en ent* This is %recisely my area of work* There are )arious ways of im%lementing this* ?ne of the %ro%erties that most metho s ha)e in common is that they are ma e u% of two %hases, training an recognition* The latter may only be %erforme after the former! which in)ol)es %resenting the system with s%eech from target languages 7i*e* those that we are trying to recogni/e8* Different systems then mo el languages accor ing to %articular language' e%en ent features* During recognition! these features are com%are to those of utterances being teste ! in or er to eci e which language is the correct one* The sim%lest form of training uses only a sam%le s%eech wa)e! an the true i entity of the language being s%oken* 4ore com%le+ a%%roaches use %honetic transcri%tions 7a se5uence of symbols re%resenting the soun s in each utterance8! or orthogra%hic transcri%tions 7the te+t of the wor s s%oken8! along with a %ronunciation ictionary! which woul ma% each wor to its re%resentation* <uch metho s are ob)iously more costly an time'consuming! not least because fluent s%eakers for each target language are re5uire * Be ha)e use the former a%%roach for training* 3anguage mo els will be iscusse in great e%th in ne+t section*
2.3. Example !honetic "ecognition#Language $odeling

In last section we ha)e seen that it is the likelihoo of occurrence of a %honeme in a language which ifferentiates one language from another* .honetic $ecognition! 3anguage 4o eling 7.$348 is base on this %rinci%le* This system uses acoustic %re'%rocessing for feature )ector e+traction as iscusse in section 2*2*1* Then a language s%ecific %honeme recogni/er is %lace to con)ert s%eech into %honeme se5uences an at the en lies the language mo els which calculates the n'Gram %robabilities* 6igure gi)es a gra%hical )iew of the system
Disa )antage of this system is that it uses a single language e%en ent %honeme recogni/er! which can make its results bias to the language in which recogni/er is traine because the %hones %resent in target languages o not always occur in the language use uring training* Be may wish to incor%orate soun s from more than one language into a .$34'like system* An alternati)e to it can be to use multi%le .$34 systems in %arallel! with recogni/ers traine in ifferent languages! as shown in figure*
' 10 '
Bhile enhancing %erformance! this a%%roach has a cou%le of isa )antages! namely the nee for labele training s%eech in more than one language! an the increase %rocessing time*
*. O,-%cti.%&
The ma(or ob(ecti)e or goals which were set before starting this %ro(ect are as follows, An algorithm for more accurate language ID! in fiel of language mo els* Design some alternati)e to con)entional front'en %rocessing! which shoul not be language e%en ent* A tool base on abo)e algorithms for call routing in Customer Care Center* This tool will also enable a ministrator to manage the s%eech cor%us* A tool base on abo)e algorithm for MContent Aerification <ystem1 for )erifying the ata files %resent in ata ser)ers*
/. Di&cu&&ion 0 on +!n#u!#% (od%+&

In the last section a brief intro uction was gi)en about all the three ifferent as%ect of 3anguage ID* Till now most of research one in fiel of language ID was focuse on the first two stages of it like that of 4 Qissman G2H or 3 <chwar t G:H* 3ittle work is one in the fiel of language mo els which makes the last %hase* I ha)e first stu ie all the e+isting metho s for this %ur%ose an then esigne some new algorithms for the same %ur%ose* In%ut gi)en to these mo els is the %honeme se5uences obtaine from the recogni/er an the out%ut e+%ecte from them is language in which the in%ut s%eech is* "sually language mo els are esigne in two %hases, Training an recognition %hase* In the esign which I ha)e %ro%ose I ha)e intro uce one more %hase of Tuning! in which system %arameters are o%timi/e * <o now %hases are, raining !hase uning !hase esting !hase =ow before iscussing each %hase at im%lementation le)el lets talk about whole %rocess of language recognition at an abstract le)el* In the first %hase of language mo els large amount of training ata is %asse through the mo el along with the language information an no recognition takes %lace in this %hase* =ow base on training ata )arious %robabilities are calculate like %robability of occurrence of gi)en %honeme in gi)en language* =ow once all these %robabilities are calculate then ne+t %hase of recognition starts* In this %hase also s%eech ata is %asse through ' 11 '
Machine Understanding of Indian Spoken Languages mo els but now language information is not gi)en to the system! in fact it1s the system which %re icts the language of the s%eech ata by using the %robabilities calculate in training %hase* 6or e+am%le su%%ose s%eech ata for >nglish language was %asse then in training %hase what mo el will calculate is the %robability like .7%hD>8 which what is the %robability of %honeme M%h1 occurring in >nglish s%eech ata* =ow uring recognition %hase su%%ose some %honeme se5uence %h1 %h2 %h2 occurre in the test ata then the %robability that this se5uence is in >nglish will be !" ph1 ph2 ph# $ %ng& ' !" ph1 $ %ng& ( !" ph2 $ %ng& ( !" ph# $ %ng& This is a )ery sim%le way of %utting things! actually formulae is not this straight an that is what has been stu ie in this %ro(ect* =ow let1s see all these %hases in etail*
%.1. Language $odel & 'raining !hase

Training %hase is the most im%ortant %hase among all the three! an eci es how goo or ba the whole system is going to %erform* 4ain aim of this %hase is to e+tract ma+imum %ossible information about a language from its training ata* There are number of ways to o it* ?ne of them is by fin ing the %robability with which a gi)en %honeme from a set occurs in that language* There are two issues to eal with in this %robabilistic a%%roach which are as follows, i* 0ethod of finding the probabilities, There are number of ways in which %robabilities relate to a %honeme can be foun * ?ne of them is P:ph;<=! which re%resent the %robability of %honeme %h occurring in language C* This )alue is foun by counting the number of times a gi)en %honeme occurs in training ata an then i)i ing it by the total number %honemes in the whole ata an is calculate for e)ery %honeme* ii* Type of probabilities, =ow there are )arious ways of ca%turing the language s%ecific information from the training ata* Bhile selecting the a%%ro%riate metho there are some %arameters you shoul consi er* 6irst of them is the kin of a%%lication! language mo els are esigne for* An secon is the com%utational resource a)ailable* 3ike in our case where we are trying to esign a language ID! we know that it1s the or er in which %honemes occur! makes one language ifferent from other an then there are some %honemes which are s%ecific to some languages an ne)er occur in others*
<o consi ering these factors a training mo el was esigne in which when ata is %asse three ty%es of %robabilities are calculate * 1* 6nigram, These are the %robability of occurrence of single %honeme in language* These basically try to ca%ture the istinct %honemes which are s%ecial to %articular language like %honemes en ing with JhJ are more %robable in 9in i than in >nglish* Uni>Prob:ph;<= R 'o. of ti$e phone$e ?ph@ occur in the training data ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' %ount of total no of phone$es in the training data for ?<@ 2* Bigram & Trigram, This is the %robability of a %honeme being followe by a gi)en %honeme or %air of %honemes in a gi)en language* This basically tries to ca%ture the se5uential information relate to %honemes ' 12 '
Machine Understanding of Indian Spoken Languages which is also s%ecific to a language* This metho is calle as n'Gram a%%roach! an we can go till any )alue of n but as you increase )alue of n com%le+ity increases e+%onentially* Therefore we ha)e calculate till nR2* Bi>Prob:ph/ ; <, ph+= R 'o. of ti$e phone$e ph+ is folloAed by phone$e ph/ in the data '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'o of ti$es phone$es ph+ occurs in the training data of language < )ri>Prob:ph3 ; <, ph+, ph/= R 'o. of ti$e phone$e ph+ is folloAed by ph/ and then ph3 in the data 00000000000000000000000000000'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'o. of ti$e phone$e ph+ is folloAed by ph/ in the training data of language < <o in a sim%le language in case of trigrams what you o is that instea of consi ering %honeme as single unit you consi er se5uence of three %honemes as a single unit to calculate the %robability* 6ollowing iagram gi)es a %ictorial )iew of the whole %rocess,
<o this was all about training %hase of language mo els* The out%uts of this %hase are three %robability files for each language as escribe abo)e* =ow before iscussing about other %hases lets iscuss briefly how these are going to use these %robability ' 12 '
Machine Understanding of Indian Spoken Languages files* 3et1s su%%ose Na b c e f Nis a %honeme se5uence! where a! b! c! ! e an f are ifferent %honemes! came for recognition uring testing %hase! then ifferent %robabilities relate to it will be, Unigra)*!rob ' .igra)*!rob ' rigra)*!rob ' "#& !"a$+&( !"b$+& (!"c$+&(!"d$+&(!"e$+&(!"f$+&,,,,,--"1& !"b$+/ a&( !"c$+/ b& (!"d$+/ c&(!"e$+/ d&(!"f$+/ e&,,,---"2& !"c$+/ a/ b&( !"d$+/ b/ c& (!"e$+/ c/ d&(!"f$+/ e/ f&,,,,-
<o this way we calculate three %robabilities relate to each %honeme se5uence* <o the first %roblem which we shoul eal is the %roblem of /ero'%robability %enalty! which eals with cases when a gi)en %honeme actually ne)er occurre in training ata of a language* Then secon %roblem which we shoul eal now! is how to inter%ret these %robabilities an combine them into single )alue* 6or this we nee to fin weights for ifferent %robabilities* 3et1s call al%ha! beta an gamma as the re5uire weights for unigram! bigram an trigram %robability! then the final %robability of some %honeme se5uence P belonging to a s%ecific language < will be, Prob:P;<= B alphaCUnigra$>Prob:P;<= DbetaCBigra$>Prob:P;<= Dga$$aC)rigra$>Prob:P;<= <o now our first task is to fin out )alues of al%ha! beta an gamma* Then the thir %roblem which we shoul eal is that to fin o%timal )alue of uration for which %honeme se5uence shoul be recor e * <ee in this %roblem the i eal solution will be to take %honeme se5uence of the whole a)ailable ata* 0ut we shoul consi er two gra%hs! first is the gra%h between the com%le+ity an uration which is an e)er increasing gra%h* <econ is the gra%h between accuracy with which language is %re icte for a gi)en %honeme se5uence with its uration* This gra%h is e+%ecte to be asym%totic in nature* All these issues are ealt in following section*
%.2. Language $odel & 'uning !hase

As iscusse in last section! in training %hase all system %arameters were calculate * 0ut in a system there are some hy%er'%arameters for which! neither is there any em%irical way of calculating their )alues nor can they be calculate on the same ata set on which system has been traine * 6or this reason an interme iate ste% of tuning the system has been %ro%ose by us* In this ste%! o%timal )alues of all the system1s hy%er'%arameters are foun out! an all the e+%eriments which are one in this %hase are %erforme on tuning ata which is totally is(oint from the one which is use for training an testing* 3et1s look at three %roblems iscusse in last section one by one,
/. .1. P%n!+t1 2or )%ro 3ro,!,i+it1 3'on%(%&

Consi er a case when a %honeme se5uence .7 a b c 8 occur in the testing ata* Then its unigram %robability accor ing to 718 will be, "nigramS.rob7.TC8 R !"a$+&( !"b$+& (!"c$+&(!"d$+& =ow it might be %ossible that one of the %honeme in the test se5uence actually ne)er a%%eare in the whole training ata of a gi)en language <! an thus the %robability of ' 1: '
Machine Understanding of Indian Spoken Languages that %honeme is /ero for <* Then what will ha%%enU =ow i eally we shoul not o anything in this regar an whene)er there is such %honeme in a se5uence! we shoul make the %robability of whole se5uence as /ero! but the catch is that our %honeme recogni/er is also not %erfectly accurate so we shoul take a case a wrong con)ersion of s%eech to %honeme se5uence* <o what we shoul actually o is to assign a )ery small but non'/ero )alue to all %honemes ha)ing /ero %robability* This a%%roach has two a )antages* 6irst that now es%ite ha)ing a /ero'%robability %honeme! a gi)en %honeme se5uence is still eligible for recognition! an secon because of a small )alue of %robability it will try to bias the results in negati)e si e! which i eally shoul ha)e been the case* Gi)en below is gra%h for ifferent )alues of %enalty on +'a+is an accuracy with which system works in y'a+is* .enalty )alues are re%resente in terms of + R e+%onent 7%8 where p is actual %enalty in terms of %robability an )alues in + a+is are its logarithmic counter%art* =ow you can clearly see a %eak in the gra%h at +R '10*
/. . .W%i#'t& 2or Uni#r!(4 Bi#r!(4 !nd Tri#r!( 3ro,!,i+iti%&

As iscusse in last section there is a re5uirement for fin ing an o%timal way of combining the three ifferent %robability )alues which will be calculate ! so that combine )alue re%resents the information in best %ossible way* 6or e+am%le! the sim%lest way of combining three )alues is by gi)ing them e5ual weights! but this is most %robably a wrong way of oing things because here we are no where consi ering which a%%roach contains more information* 6ollowing is brief iscussion on ifferent ways of combining the %robabilities along with their com%arati)e stu y* As such there is no %erfect way of combining- we can only fin the most o%timal of them* Different mo els esigne , Model +. Model /. -lat co$bination :alpha B+E3,beta B+E3, ga$$a B+E3= *enerating Aeights on the basis of perfor$ance of different $odels.
In this results were foun for all the three ifferent a%%roach an then the weights were assigne accor ingly* This way we will gi)e higher weights to a%%roach which is %erforming better* >m%irical formula for the %arameter is as follows,
' 1& '

alpha>eng B Prob: Uni>(ng F Uni>&in= 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 Prob: Uni>(ng F Uni>&in= D Prob: Uni>(ng F Uni>&in= D Prob: )ri>(ng F )ri>&in=
an similarly for other two %arameters* Model 3. Sa$e as / but noA probabilities Aere nor$ali2ed.
The omain of %robabilities for unigram! bigram an trigram were not same therefore irect combination of their )alues will not re%resent the right %icture! therefore now all )alues were normali/e so that they came in same range* >)en for normali/ation! ifferent ways were teste out of which! one in which e)ery )alue is i)i e by the ma+imum )alue in its set! %ro uce the best results* After normali/ing the )alues! secon a%%roach is im%lemente * 4o el :* *enerated using *aussian MiGture Model and $aGi$i2ing the likelihood of single language* In this a%%roach ifferent %arameters were generate using Gaussian 4i+ture 4o el a%%roach in which o%timal )alues of %arameters are foun by ma+imi/ing any gi)en )alue! which is calculate by combination ifferent )alues* In this a%%roach )alue which was ma+imi/e was sum of combine %robability obtaine by using current )alue of %arameters! which is re%resente by the following )alue, SUM:alpha>engCP:SeH ;Uni=Dbeta>engCP:SeH ;Bi=Dga$$a>engC P: SeH ;)ri = = where .7<e5(T"ni8 re%resents the unigram %robability for se5uence (* .7<e5(T0i8 re%resents the bigram %robability for se5uence (* .7<e5(TTri8 re%resents the trigram %robability for se5uence (* Model 7. *enerated using *aussian MiGture Model and $aGi$i2ing the difference of likelihoods fro$ different languages. This a%%roach is same as the last one (ust the )alue which is ma+imi/e is now the ifference between the combine %robabilities for ifferent languages which is re%resente by the following )alue, SUM: alpha>eng C:P: SeH ; Uni(=E P: SeH ; Uni&== D beta>eng C:P: SeH ; Bi(= E P: SeH ; Bi&== D ga$$a>eng C :P: SeH ; )ri( =EP: SeH ; )ri& == = 6ollowing figure shows the gra%hical )iew of all the mo els along with the )alue of ifferent %arameters,
' 1F '
/. .*.
Utt%r!nc% Dur!tion
This %arameter eci es how long shoul be the %honeme se5uence in terms of time uration! for efficient a%%lications* I eally more the uration better the result will be! but com%utational resources %ut limit on it* As we will increase the uration! com%le+ity will also increase therefore we nee to fin a o%timal )alue for this %arameter* 6rom following gra%h you can clearly see that there in no significant increase in accuracy after some )alue of utterance uration 7&0 secs8* In this gra%h +' a+is is the utterance uration an y'a+is is accuracy with which results are %re icte *
' 1L '
%.3. Language $odel & 'esting !hase

This %hase is the last %hase of this %rocess an in this all the mo els are e)aluate against a fresh ata set* 6ollowing figure shows the com%arati)e stu y of ifferent mo els which were esigne in the tuning %hase along with their %erformances*
The test ata is first %asse from three mo els an three %robabilities are calculate se%arately for each n'Gram an %honeme se5uence* Then these )alues are normali/e by i)i ing them by the ma+imum )alue of %robability which was obser)e uring tuning %hase* Then these )alues are combine using fi)e ifferent mo els* =ow from the abo)e figure we can make out its Model 3 which is %erforming best! but because of the o)erhea inclu e in it makes Model /1s %erformance as best* Therefore we conclu e the mo el 21s way of combining ifferent n'Gram %robabilities as o%timal*
5. Di&cu&&ion 0 Ot'%r !33ro!c'%&

In last section we iscusse about language mo els! which are one of the many ways of i entifying language from %honeme se5uences* ?ne other a%%roach for the same %ur%ose can be by fin ing out istinct %honemes in ifferent language* 3et1s iscuss other alternati)es in etail,
(.1. "ecognition )ased on distinct phonemes

$ecognition base on istinct %honemes aims at fin ing those %honemes which ha)e some uni5ue characteristics relate to occurrence in ifferent languages* There might be some %honemes which are e+ce%tionally high %robable in some language while its occurrence is rare in other language! an then a recogni/er can be built from the list of such %honemes* An attem%t was ma e to fin out such %honemes in >nglish & ' 1@ '
Machine Understanding of Indian Spoken Languages 9in i* 6or this the %robability files which were generate after training %hase in last section were analy/e by %lotting some gra%hs using 4atlab*
5.1.1.
Gr!3'& 2or Uni#r!( !nd Bi#r!( 3ro,!,i+iti%&
Gra%h gi)en below is of unigram %robabilities for both >nglish an 9in i* The %robability )alues are ma%%e in to a color base on the ma%%ing shown in the right han ! with un erlying %rinci%le that higher the %robability )alue! brighter the color will be for that %honeme* In this gra%h %honemes are re%resente in +'a+is while languages are shown in y'a+is*
<econ gra%h is of bigram %robability )alues for >nglish* In this gra%h also color co ing is same as use in the %re)ious case* In this case the %honeme coming first is shown in y'a+is an %honeme coming secon is shown in +'a+is*
' 19 '
5.1. .O,&%r.!tion&
6rom the gra%hs gi)en abo)e! some im%ortant obser)ations can be ma e, There are some %honemes which ha)e %robability )alues significantly higher than other %honemes! but those %honemes ha)e higher )alues for both the language* In case of unigram %robabilities! %honemes which are ha)ing brighter )alues are actually theoretically also most %robable because they re%resent the )owels Ja an Ji* In case of bigrams again the same %honemes are showing the brighter colors* =o other %honemes show any e+ce%tional beha)ior*
5.1.*. Conc+u&ion
As seen in the last section! there is no %honeme ha)ing significant ifference in %robability of occurrence in two languages* Therefore no efficient system can be esigne for language ID! base on this %rinci%le*
(.2. *sing +aussian $ixture $odels for ,ront end processing

Till now in all the algorithms an mo els! we ne)er looke at the front'en %rocessing as%ect of all this %rocess* 944 base %honeme recogni/ers are use for this %ur%ose as iscusse in section 2*2*2* The shortcomings of such recogni/ers are as follows, 6irst %roblem with such recogni/ers is that they are )ery much language s%ecific* <o in the mo els where a single %honeme recogni/er is use for e)ery language! results get biase to that language for which recogni/ers were traine * <econ %roblem with them is that they nee huge amount of s%eech ata for training other then what we use for language mo els* .roblem is that there can be some language which we want to recogni/e through our system but enough ata is not a)ailable for training <o an attem%t was ma e to esign a language in e%en ent front en %rocessor* 6or this a G44 base algorithm was esigne ! in which conce%t of classes is use instea of %honemes* The feature )ectors which were generate after first %hase of language ID were segmente into ifferent Ibroa %honetic categoryI or classes! base on some %ro%erties G1:H* An then the whole s%eech ata was con)erte into se5uence of classes! an then %lays the same role! which %honeme se5uences %laye in %re)ious a%%roach* =ow because of its %robabilistic a%%roach! it makes it language in e%en ent an thus better than 944 base %honeme recogni/ers although it has not been %ro%erly teste * 0ut the system esigne on this %rinci%le has a lower %erformance 7as shown in the table8 in com%arison to %honeme base mo els*
' 20 '
Machine Understanding of Indian Spoken Languages <erial no, 1 A%%roach "se Data<et "se V Customer Care Call $ecor s $esults7W8 $emarks 70est >ngD 0est 9in8 90*::D@9*11 Pros, It i not use 7100D2008 any 3anguage s%ecific recogni/er Cons, .erformance is not com%arable* "tterance Duration for o%timal recognition ratio is )ery high*
In this s%eech ata was con)erte into a ce%stral )ectors like form using software <C&P> which generate *mfcc files?an then the files were %asse through Gaussian 0i@ture 0odels! first to train them an then to con)ert them into se5uence of more abstract classes which are analogous to %honemes use in the earlier metho * Then the language mo els were a%%lie on it* In this s%eech ata was con)erte into a ce%stral )ector like form using software <C&P> which generate *mfcc files?an then the files were %asse through 9i en 4arko) 4o els! first to train them an i entify significant classes from the ata an then to con)ert the )ector se5uences into se5uence of classes analogous to %honemes use in the earlier metho * Then the language mo els were a%%lie on it*
Customer Care Call $ecor s
L1*:LDL1*92 Pros, It i not use 7200D2008 any 3anguage s%ecific recogni/er Cons, .erformance is not com%arable* "tterance Duration for o%timal recognition ratio is )ery high*
6. Di&cu&&ion 0 Too+& ,!&%d on +!n#u!#% ID

0ase on the algorithms iscusse in last section! some tools were esigne * These tools! after %ro%er testing! will be use for s%eech base a%%lications for $eliance Infocomm* <ome of them are as follows,
-.1. Content .erification /0stem 1C./2

<%eech a%%lications are getting more an more %o%ular with each %assing ay* All these a%%lications in)ol)e lots of s%eech ata store on some ser)er an use whene)er a customer re5uests for it* 6or e+am%le su%%ose there is an a%%lication that %lays news an a customer re5uest for it in 9in i! so this a%%lication will %lay the file store in ser)er corres%on ing to that an e)erything will go smooth* 0ut the %roblem
' 21 '
Machine Understanding of Indian Spoken Languages arise is that how to make sure that the file which is store on ser)er is actually in 9in i only an not in any other language! because at the en it will be some human who will be u%loa ing these files an human ha)e this ten ency to o something calle as mistake* <o we nee some mechani/e way of testing the content store at ser)er* Bith this aim CA< came into e+istence* Though this CA< has many as%ects an things to )erify! but language being the most im%ortant of them* <o we ha)e esigne a tool for this %ur%ose* 6igure gi)en below gi)es a gra%hical )iew of system* Currently it is in its testing %hase* <T>. 1 C?=T>=T <>$A>$ CA<
Downloa s the content files e)ery hour* .re icts the language using %honeme se5uences an then )erifies it* Con)erts s%eech file into %honeme se5uences
<T>. 2 .9?=>4> $>C?G=IQ>$
<T>. 2
3anguage ID
-igureI floA diagra$ of %JS ?nce its testing %hase is finishe Reliance Infoco$$ is %lanning to use for their internal usage*
-.2. Call routing for customer care centers

To ay most of the big com%anies ha)e their customer from )arious linguistic backgroun s* Also e)ery com%any wants to %ro)i e a customer care centers for their customers to sol)e their %roblems as fast as %ossible* =ow %roblem is to route the customer calls to call agent who is fluent in customer1s language* Currently it1s one using an IA$< system! but this result in loss of time* <o an attem%t is ma e to esign a system which can route the calls on the basis of language in which customer starts s%eaking* <o whene)er someone calls first of all it goes to language ID ser)er! which first recogni/es the language customer is using an then routes the call to an a%%ro%riate call agent* This system is currently in its e)elo%ment %hase*
' 22 '
7. Conc+u&ion
The %ur%ose of e)elo%ing language ID is to make the %rocess of language recognition mechani/e an hence enable many s%eech base a%%lications to use it as a black bo+* An the absence of absolutely correct way of recogni/ing languages by machine makes this fiel of language ID! a challenging research area* Although the recent %rogress in language ID has been %romising! currently e+isting systems are still not )ery reliable in istinguishing between a set of 10 or 11 languages* All re%orte systems still %erform much better when e+%ose to about &0s of s%eech! com%are to 10s* In this re%ort a self esigne no)el metho for language mo els was also iscusse which im%ro)e the recognition accuracy by significant %ercentage* ?n com%aring the systems that uses Ibroa %honetic categoryI rather than actual %hones as base units for mo eling! it has been foun that the latter %erform significantly better* 3ooking at the current growth of mobile usage! researchers will ha)e to come u% with a reliable solution for language I )ery soon* This %ro(ect was a ste% towar s this irection* Though it i not sol)e the com%lete %roblem! but was able to come out with an im%ro)e algorithm for one of its im%ortant as%ect along with one tool! MContent Aerification <ystem1 for s%eech ata )erification* $eliance Infocomm will be using this tool for )erification ata store for s%eech a%%lications*
' 22 '
8. R%2%r%nc%&
G1H <te)e ;oung! NThe 9TX 0ookP! Cambri ge "ni)ersity Technical <er)ices 3t ! December 199&* G2H 4*A* Qissman! O3anguage I entification using .honeme $ecognition an .honotactic 3anguage 4o elingO! in .rocee ings ICA<<. I9&! 199&* G2H ;*X* 4uthusamy! >* 0arnar ! an $*A* Cole! O$e)iewing Automatic 3anguage I entificationO! in I>>> <ignal .rocessing 4aga/ine! ?ctober 199:* G:H 3 <chwar t! NAutomatic language i entification using mi+e or er 944s an "ntranscribe cor%oraP! IC<3. 2000* G&H #*$* Deller! #*G* .roakis! #*9*3 9ansen! ODiscrete'Time .rocessing of <%eech <ignalsO! 4ac4illan! =ew ;ork! 1992* GFH B*A* Ainsworth! O<%eech $ecognition by 4achineO! .eter .eregrinus! 3on on! 19@@* GLH <*0* Da)is an .* 4ermelstein! OCom%arison of .arametric $e%resentations for 4onosyllabic Bor $ecognition in Continuously <%oken <entencesO! in I>>> Transactions! Aol* A<<.'2@! =o* :! August 19@0* G@H 4*A* Qissman! OAutomatic 3anguage I entification using Gaussian 4i+ture an 9i en 4arko) 4o els! in .rocee ings ICA<<. I92! )ol 2! %%*299':02! A%ril 1992* G9H 4*A* Qissman! OCom%arison of 6our A%%roaches to Automatic 3anguage I entification of Tele%hone <%eechO! in I>>> Transactions on <%eech an Au io .rocessing! Aol* :! =o* 1! #anuary 199F* G10H ;*X* 4uthusamy! O<egmental A%%roach to Automatic 3anguage I entificationO! .h*D* thesis! ?regon Gra uate Institute of <cience & Technology! 1992* G11H 4 A Xohler! NA%%roaches to 3anguage I entification using Gaussian 4i+ture 4o els an <hifte Delta Ce%stral 6eaturesP! IC<3. 2002* G12H George Constantini es an "ras $a%a(ic! N3anguage I entification in multilingual systemsP! <ur%riseS9F! )ol :! gac1*3ink, htt%,DDwww* oc*ic*ac*ukDYn Dsur%riseS9FD(ournalD)ol:Dgac1Dre%ort*html G12H 9* 9ermansky! =*4organ! A* 0ayya! an .* Xohn! O$A<TA'.3. <%eech Analysis Techni5ueO! in .roc* ICA<<. I92! Aol* 1! 4arch 1992! %%* 121'12:* G1:H # A(mera an C Booters! NA $obust <%eaker clustering algorithmP! Tech* $e%* 2@! IDIA.! 2002* 3ink, www*icsi*berkeley*e uDft%DglobalD %ubDs%eechD%a%ersDasru02'a(mewoot*% f G1&H T*#* 9a/en an A*B* Que! OAutomatic 3anguage I entification using a <egment' 0ase A%%roachO! in .rocee ings 2r >uro%ean Conference on Acoustics! <%eech! an <ignal .rocessing I@9! Glasgow! <cotlan ! 4ay I@9*
' 2: '

Machine Understanding OF Indian Spoken Languages: B Tech Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Understanding OF Indian Spoken Languages: B Tech Project

Uploaded by

Copyright:

Available Formats

Machine Understanding of Indian Spoken Languages