• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
Adaptive Statistical Language Modeling:A Maximum Entropy Approach
Ronald RosenfeldApril 19, 1994
CMU-CS-94-138School of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Thesis Committee:Raj Reddy, co-chairXuedong Huang, co-chair, Carnegie Mellon and Microsoft CorporationJaime CarbonellAlexander RudnickySalim Roukos, IBM Corporationc
 
1994 Ronald Rosenfeld
Thisresearch was supportedby theDepartment of theNavy, NavalResearch Laboratory under Grant No.N00014-93-1-2005. The views andconclusions contained in thisdocumentare those of theauthor andshouldnot be interpreted as representing the official policies, either expressed or implied, of the U.S. government.
 
Keywords:
language modeling, adaptive language modeling, statistical language mod-eling, maximum entropy, speech recognition.
 
Abstract
Language modelingisthe attempt to characterize,capture and exploit regularitiesin naturallanguage. In statistical language modeling, large amounts of text are used to automaticallydetermine the model’s parameters. Language modeling is useful in automatic speechrecognition, machine translation, and any other application that processes natural languagewith incomplete knowledge.In this thesis, I view language as an information source which emits a stream of symbolsfrom a finite alphabet (the vocabulary). The goal of language modeling is then to identifyand exploit sources of information in the language stream, so as to minimize its perceivedentropy.Most existing statistical language models exploit the immediate past only. To extractinformation from further back in the document’s history, I use
trigger pairs
as the basicinformation bearing elements. This allows the model to adapt its expectations to the topicof discourse.Next, statistical evidence from many sources must be combined. Traditionally, linearinterpolation and its variants have been used, but these are shown here to be seriouslydeficient. Instead, I apply the principle of Maximum Entropy (ME). Each informationsource gives rise to a set of constraints, to be imposed on the combined estimate. Theintersection of these constraints is the set of probability functions which are consistent withall the information sources. The function with the highest entropy within that set is the MEsolution. Given consistent statistical evidence, a unique ME solution is guaranteed to exist,and an iterative algorithm exists which is guaranteed to converge to it. The ME framework is extremely general: any phenomenon that can be described in terms of statistics of thetext can be readily incorporated.An adaptive language model based on the ME approach was trained on the Wall StreetJournal corpus, and showed 32%–39% perplexity reduction over the baseline. Wheninterfaced to SPHINX-II, Carnegie Mellon’s speech recognizer, it reduced its error rate by10%–14%.The significance of this thesis lies in improving language modeling, reducing speech recog-nition error rate, and in being the first large-scale test of the approach. It illustrates thefeasibility of incorporating many diverse knowledge sources in a single, unified statisticalframework.iii
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...