Communications of COLIPS
8
(1): 85-100
h
ttp://www.comp.nus.edu.sg/~colips/commcolips/,
paper P98006
85
MI-Trigger-based
L
anguage
Modeling
GuoDong Zhou and KimTeng Lua
Department of Information Systems and Computer Science National University of SingaporeLower Kent Ridge RoadSingapore 119260{zhougd, luakt}@iscs.nus.edu.sgSubmitted, revised and accepted on 20 November, 1998
A
bstract
Th
is paper proposes a new MI-
T
rigger-based modeling approac
h
to capture t
h
e preferred relations
h
ips between words over a s
h
ort or long distance. It is implemented by t
h
e concept of trigger pair, w
h
ic
h
is selected by average mutual information and measured by mutual information. Bot
h
t
h
e distance-independent(DI) and distance-dependent(DD) MI-
T
rigger-based models are constructed wit
h
in a window of a size from 1 to 10. It is found t
h
at t
h
e DD MI-
T
rigger models
h
ave better performance t
h
an t
h
e DI MI-
T
rigger models for t
h
e same window size and it is better to model t
h
e preferred relations
h
ips in adistance-dependent way. It is also found t
h
at t
h
e number of t
h
e trigger pairs in an MI-
T
rigger model can be kept to a reasonable size wit
h
out losing too muc
h
of its modeling power. Finally, it is concluded t
h
at t
h
e preferred relations
h
ips between words are useful tolanguage disambiguation and can be modeled efficiently by t
h
e MI-
T
rigger-based modeling approac
h
.
K
eywords
W
ord Association, MI-
T
rigger Modeling Approac
h
, Preferred Relations
h
ip, Long-distanceContext Dependency, Average Mutual Information, Mutual Information.
1
Introduction
In natural language there always exist many preferred relationships between words.Lexicographers always use the concepts of collocation, co-occurrence and lexis to describethem. One classic example is ³strong´ and ³powerful´. (Halliday, 1966) noticed that thesetwo words were used in different language environments such as ³strong tea´ and³powerful computer´ although they had similar syntax and semantics. Psychologists also
Leave a Comment