• Embed Doc
  • Readcast
  • Collections
  • 1
    CommentGo Back
Download
 
Communications of COLIPS 
8
(1): 85-100
h
ttp://www.comp.nus.edu.sg/~colips/commcolips/,
 paper P98006
 
85
MI-Trigger-based
L
anguage
 
Modeling
 
GuoDong Zhou and KimTeng Lua
Department of Information Systems and Computer Science National University of SingaporeLower Kent Ridge RoadSingapore 119260{zhougd, luakt}@iscs.nus.edu.sgSubmitted, revised and accepted on 20 November, 1998
A
bstract
 
Th
is paper proposes a new MI-
rigger-based modeling approac
h
to capture t 
h
e preferred relations
h
ips between words over a s
h
ort or long distance. It is implemented by t 
h
e concept of trigger pair, w
h
ic
h
is selected by average mutual information and measured by mutual information. Bot 
h
h
e distance-independent(DI) and distance-dependent(DD) MI-
rigger-based models are constructed wit 
h
in a window of a size from 1 to 10. It is found t 
h
at
h
e DD MI-
rigger models
h
ave better performance t 
h
an
h
e DI MI-
rigger models for t 
h
e same window size and it is better to model t 
h
e preferred relations
h
ips in adistance-dependent way. It is also found t 
h
at
h
e number of t 
h
e trigger pairs in an MI-
rigger model can be kept to a reasonable size wit 
h
out losing too muc
h
of its modeling  power. Finally, it is concluded t 
h
at t 
h
e preferred relations
h
ips between words are useful tolanguage disambiguation and can be modeled efficiently by t 
h
e MI-
rigger-based modeling approac
h
.
eywords
 
ord Association, MI-
rigger Modeling Approac
h
 , Preferred Relations
h
ip, Long-distanceContext Dependency, Average Mutual Information, Mutual Information.
1
Introduction
 
In natural language there always exist many preferred relationships between words.Lexicographers always use the concepts of collocation, co-occurrence and lexis to describethem. One classic example is ³strong´ and ³powerful´. (Halliday, 1966) noticed that thesetwo words were used in different language environments such as ³strong tea´ and³powerful computer´ although they had similar syntax and semantics. Psychologists also
 
G
uoDong Z 
h
ou and Kim
eng Lua
86have a similar concept: word association. Two highly associated word pairs are³bread/butter´ and ³doctor/nurse´. Psychological experiments in (Meyer D. et al., 1975)indicated that the human¶s reaction to a highly associated word pair was stronger and faster than that to a poorly associated word pair.The strength of word association can be measured by mutual information. Bycomputing mutual information between the two words in the word pair, we can get manyuseful preference information from the corpus, such as the semantic preference betweennoun and noun(e.g. ³doctor/nurse´), the particular preference between adjective andnoun(e.g. ³strong/tea´), and solid structure(e.g. ³the more/the more´). These informationare useful for automatic sentence disambiguation.The research in (Hindle D. et al., 1993) showed the role of correlated statisticalinformation in resolving the sentence ambiguity. Consider a simple English sentence:³She (wanted | placed | put) the dress on the rack.´For different verbs, the linking directions of the preposition phrase(³on the rack´)are different. It can modify the noun(for ³wanted´) or function as the objective complementof the verbs(for ³placed ´ and ³put´). This is the well-known preposition phrase(PP)attachment problem in the analysis of English. Their research showed that a reasonableanalysis result can be chosen by comparing the mutual information of theverb-preposition pair(³want/on´) and the object-preposition pair(³dress/on´).(Magerman D, et al., 1990) used a computational model of mutual information toautomatically segment short phrases. (Rosenfeld R, 1994) used the concept of trigger pair as the basic information bearing element to extract information from further back in thedocument¶s history. Similar research includes (Brent M, 1993) and (Kobayashi T. et al.,1994).In Chinese, a word is made up of one or more characters. Hence, there also exists preferred relationships between Chinese characters. (Sproat et al., 1990) employed astatistical method to group neighboring Chinese characters in a sentence into two-character words by making use of a measure of character association based on mutual information.Here, we will focus instead on the preferred relationships between words.The preference relationships between words can expand from a short distance to a longdistance. While N-gram models are simple in language modeling and have beensuccessfully used in speech recognition and other tasks, they have obvious deficiencies. For instance, N-gram models can only capture the short-distance dependency within an N-wordwindow where currently the largest practical N for natural language is three and manykinds of dependencies in natural language occur beyond a three-word window. While wecan use conventional N-gram models to capture the short-distance dependency, thelong-distance dependency should also be exploited properly.The purpose of this paper is to study the preferred relationships between words over ashort or long distance and propose a new modeling approach to capture such phenomena inthe Chinese language.
 
 MI-
rigger-based Language Modeling 
87The organization of this chapter is as follows: Section 2 defines the concept of trigger  pair. The criteria of how to select a trigger pair are described in Section 3 while Section 4describes a method to measure the strength of a trigger pair. Section 5 describes thetrigger-based language modeling approach. Section 6 gives a example of its applications:PINYIN-to-Character conversion. Finally, a summary of this paper is given in Section 7.
2
C
oncept
 
of 
 
Trigger
P
air
 
Based on the above description, we have decided to use the trigger pair as the basic conceptfor extracting the word association information of an associated word pair. If a word
 A
 is highly associated with another word
 B
, then
()
 A B
p
is considered a ³trigger  pair´, with
 A
being the trigger and
 B
the triggered word. When
 A
occurs in thedocument, it triggers
 B
, causing its probability estimate to change.
 A
and
 B
can bealso extended to word sequences. For simplicity, we will concentrate on the trigger relationships between single words although the ideas can be extended to longer wordsequences.How to build a trigger-based language model ? There remain two problems to besolved:1) how to select a trigger pair ?2) how to measure a trigger pair ?We will discuss them separately in the next two sections.
3
S
electing
 
Trigger
P
air
 
Even if we can restrict our attention to the trigger pair 
(,)
 AB
where A and B are bothsingle words, the number of such pairs is too large. Let
be the size of the vocabulary. Note that, unlike the case of the bigram model where the number of different consecutiveword pairs in an available corpus is always much less than
2
, the number of word pairswhere both words occur in the same document is a significant fraction of 
2
. Therefore,selecting a reasonable number of the most powerful trigger pairs is important to atrigger-based language model.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
02 / 10 / 2011This doucment made it onto the Rising List!
You must be to leave a comment.
Submit
Characters: ...