Professional Documents
Culture Documents
Machine Understanding OF Indian Spoken Languages: B Tech Project
Machine Understanding OF Indian Spoken Languages: B Tech Project
B Tech Project
Dhirubhai Ambani Institute of Information & Communication Technology Gan hinagar! G"#A$AT
A%ril 200&
'1'
CERTIFICATE
This is to certify that the Project Report titled Machine Understanding of Indian Spoken Languages submitted by Abhishek Agarwal ID 200101229 for the partial fulfillment of the re!uirements of B Tech "ICT# degree of the institute embodies the $or% done by him &ff campus under my super'ision( Date: _____________ Signature: _________________ (Dr. Sandeep Sibal) Date: _____________ Signature: _________________ (Prof. Prabhat Ranjan)
'2'
ACKNOWLEDGEMENTS
This %ro(ect in)ol)e the collection an analysis of information from a wi e )ariety of sources an the efforts of many %eo%le beyon me* Thus it woul not ha)e been %ossible to achie)e the results re%orte in this ocument without their hel%! su%%ort an encouragement* I will like to e+%ress my gratitu e to the following %eo%le for their hel% in the work lea ing to this re%ort, Dr. Sandeep Sibal, Dr. Prabhat Ran an ! Dr. "itendra # $era - .ro(ect su%er)isors, for their useful comments on the sub(ect matter an for the knowle ge I gaine by sharing i eas with them* Prof. Minal Bhise- .ro(ect Coor inator, for organi/ing an coor inating the 0Tech .ro(ects1 2001*
'2'
ABSTRACT
3anguage I entification is %rocess of i entifying the language being s%oken from a sam%le of s%eech by an unknown s%eaker* 4ost of the %re)ious work in this fiel is base on the fact that %honeme se5uences ha)e ifferent occurrence %robabilities in ifferent languages! an all the systems esigne till now ha)e trie to e+%loit this fact* 3anguage i entification %rocess in turn consists of two sub'systems* 6irst system con)erts s%eech into some interme iate form calle as %honeme se5uences! which are use to mo el the language by oing their %robabilistic analysis in the secon sub' system* In this %ro(ect both of the sub'systems are targete * 6irst some algorithms are iscusse for esigning language mo els* Then an attem%t is ma e to esign an algorithm for e+tracting %honeme se5uences in form of more abstract classes eri)e by statistical tools like Gaussian 4i+ture 4o els 7G448 an 9i en 4arko) 4o el 79448*
':'
TABLE OF CONTENTS
M#%&I'( U'D(RS)#'DI'*.....................................................................................................+ ,-...................................................................................................................................................+ I'DI#' SP,.(' L#'*U#*(S...................................................................................................+ 0;********************************************************************************************************************************************************1 C)RTI*ICAT)((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+ AC,-&./)DG)0)-T1(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((2 AB1TRACT((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((3 TAB/) &* C&-T)-T1(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((4 5( I-TR&D6CTI&-(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((7 +( BAC,GR&6-D .&R,((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((8 2*1* DI<TI=CT C9A$ACT>$I<TIC< ?6 3A=G"AG> *******************************************************************************@ 2*2* ?A>$AI>B ?= 3A=G"AG> ID>=TI6I>$<****************************************************************************************@ /./.+. -ront0(nd Processing............................................................................................................1 /././. Phone$e Recogni2er.............................................................................................................1 /./.3. Language Models................................................................................................................+4 2*2* >CA4.3>, .9?=>TIC $>C?G=ITI?=D3A=G"AG> 4?D>3I=G******************************************************10 2( &B9)CTI:)1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((55 3( DI1C611I&- ; &- /A-G6AG) 0&D)/1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((55 :*1* 3A=G"AG> 4?D>3 E T$AI=I=G .9A<>***************************************************************************************12 :*2* 3A=G"AG> 4?D>3 E T"=I=G .9A<>******************************************************************************************1: 5./.+. Penalty for 2ero probability phone$es ..............................................................................+5 5././. 6eights for Unigra$, Bigra$, and )rigra$ probabilities.................................................+7 5./.3. Utterance Duration..............................................................................................................+8 :*2* 3A=G"AG> 4?D>3 E T><TI=G .9A<>*****************************************************************************************1@ 4( DI1C611I&- ; &T<)R APPR&AC<)1 (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((58 &*1* $>C?G=ITI?= 0A<>D ?= DI<TI=CT .9?=>4><****************************************************************************1@ 7.+.+. *raphs for Unigra$ and Bigra$ probabilities...................................................................+1 7.+./. ,bser9ations......................................................................................................................../4 7.+.3. %onclusion ........................................................................................................................../4 &*2* "<I=G GA"<<IA= 4ICT"$> 4?D>3< 6?$ 6$?=T >=D .$?C><<I=G*******************************************20 7( DI1C611I&- ; T&&/1 BA1)D &- /A-G6AG) ID((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+5 F*1* C?=T>=T A>$I6ICATI?= <;<T>4 7CA<8************************************************************************************21 F*2* CA33 $?"TI=G 6?$ C"<T?4>$ CA$> C>=T>$<***************************************************************************22 =( C&-C/61I&-((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+2 8( R)*)R)-C)1((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((+3
'&'
1. Introduction
The %roblem of 3anguage I entification 7language ID8 is efine as recogni/ing the language being s%oken from a sam%le of s%eech by an unknown s%eaker G2H* The human is by far the best language ID system in o%eration to ay! with accuracy as high as hun re %ercent in case if they know the language an can make a %retty reasonable guess about them in case if they on1t* This %ro(ect has trie to e)elo% this ability in machines* <e)eral im%ortant a%%lications alrea y e+ist for language ID* A language ID system coul be use as a Ifront'en I system to a tele%hone'base com%any! routing the caller to an a%%ro%riate o%erator fluent in the callerIs language* Currently either a manual system or IA$< base system e+ists* 0ut! both of them suffer from two main %roblems! howe)er, s%ee an e+%ense* It is highly e+%ensi)e to em%loy the call routers for AT&T who! between them! must be able to correctly route 1:0 languages or $eliance Infocomm for more than 10 languages* 6or emergency ser)ices! this coul be a fatal elay* ?ther a%%lication inclu es usage of such systems in war times when sol iers are oing rescue o%erations in alien lan s! to communicate with local %erson* Another a%%lication which actually has been im%lemente uring this %ro(ect inclu es its usage in esigning Content Aerification <ystem 7CA<8! which is use for )erification of the s%eech ata store for ifferent languages* As research in automatic s%eech recognition %rogresses! a language ID system woul be necessary for any multi'lingual s%eech recognition system* ?ne such system may be a fast information system! say at an air%ort! catering for multi'national clients* Another may be an automatic translation system* 0oth these systems woul nee to first recogni/e the language that was being s%oken before they coul %rocess it* There are number of ways to achie)e this task of language ID! like base on s%ectral features of s%eech! or base on wor le+icon or i entifying %resence of some istinct characteristics in ifferent languages like s%ecial %honemes* ?nes iscusse here are base on %honetic characteristics* To esign a language ID we shoul ha)e a %ro%er knowle ge about s%eech an its com%onents* <%eech is basically consists of small units of soun calle as %honemes* 6or e+am%le if you s%eak wor 0AT! then Jb! JK! Jt are three %honemes which together forms this soun * =ow for language i entification using %honetic characteristics! first a s%eech is con)erte into %honeme se5uences! which can be one using )arious metho s* In this %ro(ect the %honeme recogni/er use ! is base on 9i en 4arko) 4o els 79448* ?nce s%eech is con)erte into %honeme se5uences then a %robabilistic analysis is one which in turn is i)i e on three %hases- training! tuning & testing! an for that the cor%us is also i)i e accor ingly* In secon %hase of %ro(ect I ha)e actually attem%te to su%erse e the con)entional 944 way of con)erting s%eech into %honeme se5uences by using more abstract classes eri)e using statistical tools like Gaussian 4i+ture 4etho 7G448 an 944 an then %assing it through the same language mo els! which were esigne in first %hase*
'F'
'L'
Machine Understanding of Indian Spoken Languages All e+%eriments were con ucte on two s%eech ata set- M$a io broa cast ata1 an MCustomer care s%eech ata1! in two languages 79in i & >nglish8* >ach ata set was in turn i)i e into three is(oint set! so that a %ro%er testing of language mo els can be one an so that there is no biasing in the results* Though results from the later ata set are of more significance because of its a%%lication but %erformance was better in case of former* A brief iscussion on the research one in this fiel along with some e+am%le is gi)en in section 2* In following section : & &! work one by me along with the algorithm %ro%ose is iscusse * After that in section F! I ha)e iscusse about the tools which ha)e been esigne by me base on the algorithm %ro%ose ! followe by conclusion an references in subse5uent sections*
. B!c"#round Wor"
This section re)iews the current metho s use for language ID an research in the area of feature )ectors an language mo els* iscusses %re)ious
'@'
. . . P'on%(% R%co#ni)%r
The basic aim behin this system is to generate the %honeme se5uences from the )ector se5uences* There are &F %honemes! an their ifferent combinations can re%resent all %ossible s%eeches in )arious languages* Be use 9i en 4arko) 4o els 7944s8 for this %ur%ose*
944 mo els are %rimarily %robabilistic state machines! in which each state re%resents a %honeme* =ow there are two kin s of %robabilities attache with each state* 6irst! is the %robability with which we can say which will be the ne+t state 70 ++! as shown in fig*8 an secon ! is the %robability with which we can say what will be the out%ut soun when this state is reache ! which are re%resente by A +7?y8 in the '9'
Machine Understanding of Indian Spoken Languages figure* 0asically 944s are use for three %roblems! out of which one which we will be using it for is to fin out the most %robable state se5uence gi)en a se5uence of out%ut soun G1H* <o basically what it oes is that when %rocesse s%eech )ectors are %asse through this system it gi)es se5uence of %honemes* "sually you ha)e %honeme recogni/er s%ecific to a language! e%en ing on the training ata use * Be will iscuss about this in later sections*
. .*.L!n#u!#% Mod%+&
These are the most im%ortant as%ect of a language ID* There basic aim is to %re ict the language gi)en a %honeme se5uence! an for this %ur%ose some kin of %robabilistic analysis is one which is im%lementation e%en ent* This is %recisely my area of work* There are )arious ways of im%lementing this* ?ne of the %ro%erties that most metho s ha)e in common is that they are ma e u% of two %hases, training an recognition* The latter may only be %erforme after the former! which in)ol)es %resenting the system with s%eech from target languages 7i*e* those that we are trying to recogni/e8* Different systems then mo el languages accor ing to %articular language' e%en ent features* During recognition! these features are com%are to those of utterances being teste ! in or er to eci e which language is the correct one* The sim%lest form of training uses only a sam%le s%eech wa)e! an the true i entity of the language being s%oken* 4ore com%le+ a%%roaches use %honetic transcri%tions 7a se5uence of symbols re%resenting the soun s in each utterance8! or orthogra%hic transcri%tions 7the te+t of the wor s s%oken8! along with a %ronunciation ictionary! which woul ma% each wor to its re%resentation* <uch metho s are ob)iously more costly an time'consuming! not least because fluent s%eakers for each target language are re5uire * Be ha)e use the former a%%roach for training* 3anguage mo els will be iscusse in great e%th in ne+t section*
Disa )antage of this system is that it uses a single language e%en ent %honeme recogni/er! which can make its results bias to the language in which recogni/er is traine because the %hones %resent in target languages o not always occur in the language use uring training* Be may wish to incor%orate soun s from more than one language into a .$34'like system* An alternati)e to it can be to use multi%le .$34 systems in %arallel! with recogni/ers traine in ifferent languages! as shown in figure*
' 10 '
Bhile enhancing %erformance! this a%%roach has a cou%le of isa )antages! namely the nee for labele training s%eech in more than one language! an the increase %rocessing time*
*. O,-%cti.%&
The ma(or ob(ecti)e or goals which were set before starting this %ro(ect are as follows, An algorithm for more accurate language ID! in fiel of language mo els* Design some alternati)e to con)entional front'en %rocessing! which shoul not be language e%en ent* A tool base on abo)e algorithms for call routing in Customer Care Center* This tool will also enable a ministrator to manage the s%eech cor%us* A tool base on abo)e algorithm for MContent Aerification <ystem1 for )erifying the ata files %resent in ata ser)ers*
Machine Understanding of Indian Spoken Languages mo els but now language information is not gi)en to the system! in fact it1s the system which %re icts the language of the s%eech ata by using the %robabilities calculate in training %hase* 6or e+am%le su%%ose s%eech ata for >nglish language was %asse then in training %hase what mo el will calculate is the %robability like .7%hD>8 which what is the %robability of %honeme M%h1 occurring in >nglish s%eech ata* =ow uring recognition %hase su%%ose some %honeme se5uence %h1 %h2 %h2 occurre in the test ata then the %robability that this se5uence is in >nglish will be !" ph1 ph2 ph# $ %ng& ' !" ph1 $ %ng& ( !" ph2 $ %ng& ( !" ph# $ %ng& This is a )ery sim%le way of %utting things! actually formulae is not this straight an that is what has been stu ie in this %ro(ect* =ow let1s see all these %hases in etail*
<o consi ering these factors a training mo el was esigne in which when ata is %asse three ty%es of %robabilities are calculate * 1* 6nigram, These are the %robability of occurrence of single %honeme in language* These basically try to ca%ture the istinct %honemes which are s%ecial to %articular language like %honemes en ing with JhJ are more %robable in 9in i than in >nglish* Uni>Prob:ph;<= R 'o. of ti$e phone$e ?ph@ occur in the training data ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' %ount of total no of phone$es in the training data for ?<@ 2* Bigram & Trigram, This is the %robability of a %honeme being followe by a gi)en %honeme or %air of %honemes in a gi)en language* This basically tries to ca%ture the se5uential information relate to %honemes ' 12 '
Machine Understanding of Indian Spoken Languages which is also s%ecific to a language* This metho is calle as n'Gram a%%roach! an we can go till any )alue of n but as you increase )alue of n com%le+ity increases e+%onentially* Therefore we ha)e calculate till nR2* Bi>Prob:ph/ ; <, ph+= R 'o. of ti$e phone$e ph+ is folloAed by phone$e ph/ in the data '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'o of ti$es phone$es ph+ occurs in the training data of language < )ri>Prob:ph3 ; <, ph+, ph/= R 'o. of ti$e phone$e ph+ is folloAed by ph/ and then ph3 in the data 00000000000000000000000000000'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 'o. of ti$e phone$e ph+ is folloAed by ph/ in the training data of language < <o in a sim%le language in case of trigrams what you o is that instea of consi ering %honeme as single unit you consi er se5uence of three %honemes as a single unit to calculate the %robability* 6ollowing iagram gi)es a %ictorial )iew of the whole %rocess,
<o this was all about training %hase of language mo els* The out%uts of this %hase are three %robability files for each language as escribe abo)e* =ow before iscussing about other %hases lets iscuss briefly how these are going to use these %robability ' 12 '
Machine Understanding of Indian Spoken Languages files* 3et1s su%%ose Na b c e f Nis a %honeme se5uence! where a! b! c! ! e an f are ifferent %honemes! came for recognition uring testing %hase! then ifferent %robabilities relate to it will be, Unigra)*!rob ' .igra)*!rob ' rigra)*!rob ' "#& !"a$+&( !"b$+& (!"c$+&(!"d$+&(!"e$+&(!"f$+&,,,,,--"1& !"b$+/ a&( !"c$+/ b& (!"d$+/ c&(!"e$+/ d&(!"f$+/ e&,,,---"2& !"c$+/ a/ b&( !"d$+/ b/ c& (!"e$+/ c/ d&(!"f$+/ e/ f&,,,,-
<o this way we calculate three %robabilities relate to each %honeme se5uence* <o the first %roblem which we shoul eal is the %roblem of /ero'%robability %enalty! which eals with cases when a gi)en %honeme actually ne)er occurre in training ata of a language* Then secon %roblem which we shoul eal now! is how to inter%ret these %robabilities an combine them into single )alue* 6or this we nee to fin weights for ifferent %robabilities* 3et1s call al%ha! beta an gamma as the re5uire weights for unigram! bigram an trigram %robability! then the final %robability of some %honeme se5uence P belonging to a s%ecific language < will be, Prob:P;<= B alphaCUnigra$>Prob:P;<= DbetaCBigra$>Prob:P;<= Dga$$aC)rigra$>Prob:P;<= <o now our first task is to fin out )alues of al%ha! beta an gamma* Then the thir %roblem which we shoul eal is that to fin o%timal )alue of uration for which %honeme se5uence shoul be recor e * <ee in this %roblem the i eal solution will be to take %honeme se5uence of the whole a)ailable ata* 0ut we shoul consi er two gra%hs! first is the gra%h between the com%le+ity an uration which is an e)er increasing gra%h* <econ is the gra%h between accuracy with which language is %re icte for a gi)en %honeme se5uence with its uration* This gra%h is e+%ecte to be asym%totic in nature* All these issues are ealt in following section*
Machine Understanding of Indian Spoken Languages that %honeme is /ero for <* Then what will ha%%enU =ow i eally we shoul not o anything in this regar an whene)er there is such %honeme in a se5uence! we shoul make the %robability of whole se5uence as /ero! but the catch is that our %honeme recogni/er is also not %erfectly accurate so we shoul take a case a wrong con)ersion of s%eech to %honeme se5uence* <o what we shoul actually o is to assign a )ery small but non'/ero )alue to all %honemes ha)ing /ero %robability* This a%%roach has two a )antages* 6irst that now es%ite ha)ing a /ero'%robability %honeme! a gi)en %honeme se5uence is still eligible for recognition! an secon because of a small )alue of %robability it will try to bias the results in negati)e si e! which i eally shoul ha)e been the case* Gi)en below is gra%h for ifferent )alues of %enalty on +'a+is an accuracy with which system works in y'a+is* .enalty )alues are re%resente in terms of + R e+%onent 7%8 where p is actual %enalty in terms of %robability an )alues in + a+is are its logarithmic counter%art* =ow you can clearly see a %eak in the gra%h at +R '10*
In this results were foun for all the three ifferent a%%roach an then the weights were assigne accor ingly* This way we will gi)e higher weights to a%%roach which is %erforming better* >m%irical formula for the %arameter is as follows,
an similarly for other two %arameters* Model 3. Sa$e as / but noA probabilities Aere nor$ali2ed.
The omain of %robabilities for unigram! bigram an trigram were not same therefore irect combination of their )alues will not re%resent the right %icture! therefore now all )alues were normali/e so that they came in same range* >)en for normali/ation! ifferent ways were teste out of which! one in which e)ery )alue is i)i e by the ma+imum )alue in its set! %ro uce the best results* After normali/ing the )alues! secon a%%roach is im%lemente * 4o el :* *enerated using *aussian MiGture Model and $aGi$i2ing the likelihood of single language* In this a%%roach ifferent %arameters were generate using Gaussian 4i+ture 4o el a%%roach in which o%timal )alues of %arameters are foun by ma+imi/ing any gi)en )alue! which is calculate by combination ifferent )alues* In this a%%roach )alue which was ma+imi/e was sum of combine %robability obtaine by using current )alue of %arameters! which is re%resente by the following )alue, SUM:alpha>engCP:SeH ;Uni=Dbeta>engCP:SeH ;Bi=Dga$$a>engC P: SeH ;)ri = = where .7<e5(T"ni8 re%resents the unigram %robability for se5uence (* .7<e5(T0i8 re%resents the bigram %robability for se5uence (* .7<e5(TTri8 re%resents the trigram %robability for se5uence (* Model 7. *enerated using *aussian MiGture Model and $aGi$i2ing the difference of likelihoods fro$ different languages. This a%%roach is same as the last one (ust the )alue which is ma+imi/e is now the ifference between the combine %robabilities for ifferent languages which is re%resente by the following )alue, SUM: alpha>eng C:P: SeH ; Uni(=E P: SeH ; Uni&== D beta>eng C:P: SeH ; Bi(= E P: SeH ; Bi&== D ga$$a>eng C :P: SeH ; )ri( =EP: SeH ; )ri& == = 6ollowing figure shows the gra%hical )iew of all the mo els along with the )alue of ifferent %arameters,
' 1F '
/. .*.
Utt%r!nc% Dur!tion
This %arameter eci es how long shoul be the %honeme se5uence in terms of time uration! for efficient a%%lications* I eally more the uration better the result will be! but com%utational resources %ut limit on it* As we will increase the uration! com%le+ity will also increase therefore we nee to fin a o%timal )alue for this %arameter* 6rom following gra%h you can clearly see that there in no significant increase in accuracy after some )alue of utterance uration 7&0 secs8* In this gra%h +' a+is is the utterance uration an y'a+is is accuracy with which results are %re icte *
' 1L '
The test ata is first %asse from three mo els an three %robabilities are calculate se%arately for each n'Gram an %honeme se5uence* Then these )alues are normali/e by i)i ing them by the ma+imum )alue of %robability which was obser)e uring tuning %hase* Then these )alues are combine using fi)e ifferent mo els* =ow from the abo)e figure we can make out its Model 3 which is %erforming best! but because of the o)erhea inclu e in it makes Model /1s %erformance as best* Therefore we conclu e the mo el 21s way of combining ifferent n'Gram %robabilities as o%timal*
Machine Understanding of Indian Spoken Languages 9in i* 6or this the %robability files which were generate after training %hase in last section were analy/e by %lotting some gra%hs using 4atlab*
5.1.1.
Gra%h gi)en below is of unigram %robabilities for both >nglish an 9in i* The %robability )alues are ma%%e in to a color base on the ma%%ing shown in the right han ! with un erlying %rinci%le that higher the %robability )alue! brighter the color will be for that %honeme* In this gra%h %honemes are re%resente in +'a+is while languages are shown in y'a+is*
<econ gra%h is of bigram %robability )alues for >nglish* In this gra%h also color co ing is same as use in the %re)ious case* In this case the %honeme coming first is shown in y'a+is an %honeme coming secon is shown in +'a+is*
' 19 '
5.1. .O,&%r.!tion&
6rom the gra%hs gi)en abo)e! some im%ortant obser)ations can be ma e, There are some %honemes which ha)e %robability )alues significantly higher than other %honemes! but those %honemes ha)e higher )alues for both the language* In case of unigram %robabilities! %honemes which are ha)ing brighter )alues are actually theoretically also most %robable because they re%resent the )owels Ja an Ji* In case of bigrams again the same %honemes are showing the brighter colors* =o other %honemes show any e+ce%tional beha)ior*
5.1.*. Conc+u&ion
As seen in the last section! there is no %honeme ha)ing significant ifference in %robability of occurrence in two languages* Therefore no efficient system can be esigne for language ID! base on this %rinci%le*
' 20 '
Machine Understanding of Indian Spoken Languages <erial no, 1 A%%roach "se Data<et "se V Customer Care Call $ecor s $esults7W8 $emarks 70est >ngD 0est 9in8 90*::D@9*11 Pros, It i not use 7100D2008 any 3anguage s%ecific recogni/er Cons, .erformance is not com%arable* "tterance Duration for o%timal recognition ratio is )ery high*
In this s%eech ata was con)erte into a ce%stral )ectors like form using software <C&P> which generate *mfcc files?an then the files were %asse through Gaussian 0i@ture 0odels! first to train them an then to con)ert them into se5uence of more abstract classes which are analogous to %honemes use in the earlier metho * Then the language mo els were a%%lie on it* In this s%eech ata was con)erte into a ce%stral )ector like form using software <C&P> which generate *mfcc files?an then the files were %asse through 9i en 4arko) 4o els! first to train them an i entify significant classes from the ata an then to con)ert the )ector se5uences into se5uence of classes analogous to %honemes use in the earlier metho * Then the language mo els were a%%lie on it*
L1*:LDL1*92 Pros, It i not use 7200D2008 any 3anguage s%ecific recogni/er Cons, .erformance is not com%arable* "tterance Duration for o%timal recognition ratio is )ery high*
' 21 '
Machine Understanding of Indian Spoken Languages arise is that how to make sure that the file which is store on ser)er is actually in 9in i only an not in any other language! because at the en it will be some human who will be u%loa ing these files an human ha)e this ten ency to o something calle as mistake* <o we nee some mechani/e way of testing the content store at ser)er* Bith this aim CA< came into e+istence* Though this CA< has many as%ects an things to )erify! but language being the most im%ortant of them* <o we ha)e esigne a tool for this %ur%ose* 6igure gi)en below gi)es a gra%hical )iew of system* Currently it is in its testing %hase* <T>. 1 C?=T>=T <>$A>$ CA<
Downloa s the content files e)ery hour* .re icts the language using %honeme se5uences an then )erifies it* Con)erts s%eech file into %honeme se5uences
<T>. 2
3anguage ID
-igureI floA diagra$ of %JS ?nce its testing %hase is finishe Reliance Infoco$$ is %lanning to use for their internal usage*
' 22 '
7. Conc+u&ion
The %ur%ose of e)elo%ing language ID is to make the %rocess of language recognition mechani/e an hence enable many s%eech base a%%lications to use it as a black bo+* An the absence of absolutely correct way of recogni/ing languages by machine makes this fiel of language ID! a challenging research area* Although the recent %rogress in language ID has been %romising! currently e+isting systems are still not )ery reliable in istinguishing between a set of 10 or 11 languages* All re%orte systems still %erform much better when e+%ose to about &0s of s%eech! com%are to 10s* In this re%ort a self esigne no)el metho for language mo els was also iscusse which im%ro)e the recognition accuracy by significant %ercentage* ?n com%aring the systems that uses Ibroa %honetic categoryI rather than actual %hones as base units for mo eling! it has been foun that the latter %erform significantly better* 3ooking at the current growth of mobile usage! researchers will ha)e to come u% with a reliable solution for language I )ery soon* This %ro(ect was a ste% towar s this irection* Though it i not sol)e the com%lete %roblem! but was able to come out with an im%ro)e algorithm for one of its im%ortant as%ect along with one tool! MContent Aerification <ystem1 for s%eech ata )erification* $eliance Infocomm will be using this tool for )erification ata store for s%eech a%%lications*
' 22 '
8. R%2%r%nc%&
G1H <te)e ;oung! NThe 9TX 0ookP! Cambri ge "ni)ersity Technical <er)ices 3t ! December 199&* G2H 4*A* Qissman! O3anguage I entification using .honeme $ecognition an .honotactic 3anguage 4o elingO! in .rocee ings ICA<<. I9&! 199&* G2H ;*X* 4uthusamy! >* 0arnar ! an $*A* Cole! O$e)iewing Automatic 3anguage I entificationO! in I>>> <ignal .rocessing 4aga/ine! ?ctober 199:* G:H 3 <chwar t! NAutomatic language i entification using mi+e or er 944s an "ntranscribe cor%oraP! IC<3. 2000* G&H #*$* Deller! #*G* .roakis! #*9*3 9ansen! ODiscrete'Time .rocessing of <%eech <ignalsO! 4ac4illan! =ew ;ork! 1992* GFH B*A* Ainsworth! O<%eech $ecognition by 4achineO! .eter .eregrinus! 3on on! 19@@* GLH <*0* Da)is an .* 4ermelstein! OCom%arison of .arametric $e%resentations for 4onosyllabic Bor $ecognition in Continuously <%oken <entencesO! in I>>> Transactions! Aol* A<<.'2@! =o* :! August 19@0* G@H 4*A* Qissman! OAutomatic 3anguage I entification using Gaussian 4i+ture an 9i en 4arko) 4o els! in .rocee ings ICA<<. I92! )ol 2! %%*299':02! A%ril 1992* G9H 4*A* Qissman! OCom%arison of 6our A%%roaches to Automatic 3anguage I entification of Tele%hone <%eechO! in I>>> Transactions on <%eech an Au io .rocessing! Aol* :! =o* 1! #anuary 199F* G10H ;*X* 4uthusamy! O<egmental A%%roach to Automatic 3anguage I entificationO! .h*D* thesis! ?regon Gra uate Institute of <cience & Technology! 1992* G11H 4 A Xohler! NA%%roaches to 3anguage I entification using Gaussian 4i+ture 4o els an <hifte Delta Ce%stral 6eaturesP! IC<3. 2002* G12H George Constantini es an "ras $a%a(ic! N3anguage I entification in multilingual systemsP! <ur%riseS9F! )ol :! gac1*3ink, htt%,DDwww* oc*ic*ac*ukDYn Dsur%riseS9FD(ournalD)ol:Dgac1Dre%ort*html G12H 9* 9ermansky! =*4organ! A* 0ayya! an .* Xohn! O$A<TA'.3. <%eech Analysis Techni5ueO! in .roc* ICA<<. I92! Aol* 1! 4arch 1992! %%* 121'12:* G1:H # A(mera an C Booters! NA $obust <%eaker clustering algorithmP! Tech* $e%* 2@! IDIA.! 2002* 3ink, www*icsi*berkeley*e uDft%DglobalD %ubDs%eechD%a%ersDasru02'a(mewoot*% f G1&H T*#* 9a/en an A*B* Que! OAutomatic 3anguage I entification using a <egment' 0ase A%%roachO! in .rocee ings 2r >uro%ean Conference on Acoustics! <%eech! an <ignal .rocessing I@9! Glasgow! <cotlan ! 4ay I@9*
' 2: '