Professional Documents
Culture Documents
Information Extraction
Mary Elaine Califf and Raymond J. Mooney (1997) R e l a t i o n a l Learning o f P a t t e r n - M a t c h R u l e s for Infor-
mation E x t r a c t i o n . In T.M. Ellison (ed.) CoNLLPT: Computational Natural Language Learning, ACL pp 9-15.
@ 1997 Association for Computational Linguistics
Posting from Newsgroup Collins, 1991) allow induction over structured exam-
Telecommunications. SOLARIS Systems ples that can include first-order logical predicates
Administrator. 38-44K. Immediate need
and functions and unbounded data structures such
Leading telecommunications firm in need as lists, strings, and trees. Detailed experimen-
of an energetic individual to fill the tal comparisons of ILP and feature-based induction
following position in the Atlanta have demonstrated the advantages of relational rep-
office: resentations in two language related tasks, text cat-
egorization (Cohen, 1995) and generating the past
SOLARIS SYSTEMS ADMINISTRATOR
Salary: 38-44K with full benefits
tense of an English verb (Mooney and Califf, 1995).
Location: Atlanta Georgia, no While RAPIEa is not strictly an ILP system, its rela-
relocation assistance provided tional learning algorithm was inspired by ideas from
Filled Template the following ILP systems.
computer_science_job GOLEM (Muggleton and Feng, 1992) is a bottom-
title: SOLARIS Systems Administrator up (specific to general) ILP algorithm based on the
salary: 38-44K construction of relative least-general generalizations,
state: Georgia rlggs (Plotkin, 1970). The idea of least-general gen-
city: Atlanta
eralizations (LGGs) is, given two items (in ILP, two
platform: SOLARIS
area: telecommunications clauses), finding the least general item that covers
the original pair. This is usually a fairly simple com-
putation. Rlggs are the LGGs relative to a set of
Figure 1: Sample Message and Filled Template
background relations. Because of the difficulties in-
troduced by non-finite rlggs, background predicates
may be either one of a set of specified values or must be defined extensionally. The algorithm op-
strings taken directly from the document. For ex- erates by randomly selecting several pairs of posi-
ample, Figure 1 shows part of a job posting, and the tive examples and computing the determinate rlggs
corresponding slots of the filled computer-science job of each pair. Determinacy constrains the clause to
template. have for each example no more than one possible
IE can be useful in a variety of domains. The var- valid substitution for each variable in the body of the
ious MUC's have focused on domains such as Latin clause. The resulting clause with the greatest cover-
American terrorism, joint ventures, rnicroelectron- age of positive examples is selected, and that clause
ics, and company management changes. Others have is further generalized by computing the rlggs of the
used IE to track medical patient records (Soderland selected clause with new randomly chosen positive
et al., 1995) or company mergers (Huffman, 1996). examples. The generalization process stops when
A general task considered in this paper is extracting the coverage of the best clause no longer increases.
information from postings to USENET newsgroups, The CHILLIN (Zelle and Mooney, 1994) system
such as job announcements. Our overall goal is to combines top-down (general to specific) and bottom-
extract a database from all the messages in a news- up ILP techniques. The algorithm starts with a most
group and then use learned query parsers (Zelle and specific definition (the set of positive examples) and
Mooney, 1996) to answer natural language questions introduces generalizations which make the definition
such as "What jobs are available in Austin for C + + more compact. Generalizations are created by se-
programmers with only one year of experience?". lecting pairs of clauses in the definition and com-
Numerous other Internet applications are possible, puting LGGs. If the resulting clause covers negative
such as extracting information from product web examples, it is specialized by adding antecedent lit-
pages for a shopping agent (Doorenbos, Etzioni, and erals in a top-down fashion. The search for new liter-
Weld, 1997). als is carried out in a hill-climbing fashion, using an
information gain metric for evaluating literals. This
2.2 Relational Learning is similar to the search employed by FOIL (Quin-
Most empirical natural-language research has em- lan, 1990). In cases where a correct clause cannot
ployed statistical techniques that base decisions on be learned with the existing background relations,
very limited contexts, or symbolic techniques such CHILLIN attempts to construct new predicates which
as decision trees that require the developer to spec- will distinguish the covered negative examples from
ify a manageable, finite set of features for use in the covered positives. At each step, a number of
making decisions. Inductive logic programming and possible generalizations are considered; the one pro-
other relational learning methods (Birnbaum and ducing the greatest compaction of the theory is im-
1o 2o ~o ,'o io oo
5 Related Work
Training Examples
Previous researchers have generally applied machine
Figure 4: Performance on job postings learning only to parts of the IE task and their sys-
tems have typically required more human interaction
than just providing texts with filled templates. RE-
postings that could be used to create a database SOLVE uses decision trees to handle coreference deci-
of available jobs. The computer-related job post- sions for an IE system and requires annotated coref-
ing template contains 17 slots, including informa- erence examples (McCarthy and Lehnert, 1995).
tion about the employer, the location, the salary, CRYSTAL USeS a form of clustering to create a dictio-
and job requirements. Several of the slots, such as nary of extraction patterns by generalizing patterns
the languages and platforms used, can take multiple identified in the text by an expert (Soderland et al.,
values. The current results do not employ semantic 1995, Soderland et al., 1996). AUTOSLOG creates
categories, only words and the results of Brill's POS a dictionary of extraction patterns by specializing a
tagger. set of general syntactic patterns (Riloff, 1993, Riloff,
The results presented here use a data set of 100 1996). It assumes that an expert will later examine
documents paired with filled templates. We did the patterns it produces. PALKA learns extraction
a ten-fold cross-validation, and also ran tests with patterns relying on a concept hierarchy to guide gen-
smaller subsets of the training examples for each test eralization and specialization (Kim and Moldovan,
set in order to produce learning curves. We use three 1995). AUTOSLOG, CRYSTAL, and PALKA all rely
measures: precision, the percentage of slot fillers on prior sentence analysis to identify syntactic ele-
produced which are correct; recall, the percentage ments and their relationships, and their output re--
of slot fillers in the correct templates which are pro- quires further processing to produce the final filled
duced by the system; and an F-measure, which is templates. LIEP also learns IE patterns (Huffman,
the average of the recall and the precision. 1996). Line's primary limitations are that it also re-
Figure 4 shows the learning curves generated. quires a sentence analyzer to identify noun groups,
At 90 training examples, the average precision was verbs, subjects, etc.; it makes no real use of semantic
83.7% and the average recall was 53.1%. These num- information; it assumes that all information it needs
bers look quite promising when compared to the is between two entities it identifies as "interesting";
measured performance of other information extrac- and it has been applied to only one domain in which
tion systems on various domains. This performance the texts are quite short (1-3 sentences).
is comparable to that of CRYSTAL on a medical do-
main task (Soderland et al., 1996), and better than 6 Future Research
that of AuTOSLOG and AUTOSLOG-TS on part of
the MUC4 terrorism task (Riloff, 1996). It also com- Currently, RAPIER, assumes slot values are strings
pares favorably with the typical system performance taken directly from the document; however, MUC
on the MUC tasks (ARPA, 1992, ARPA, 1993). All templates also include slots whose values are taken
of these comparisons are only general, since the tasks from a pre-specified set. We plan to extend the sys-
are different, but they do indicate that RAPIER is do- tem to learn rules for such slots. Also, the current
ing relatively well. The relatively high precision is an system attempts to extract the same set of slots from
especially positive result, because it is highly likely every document. RAPIER must be extended to learn
that recall will continue to improve as the number patterns that first categorize the text to determine
of training examples increases. which set of slots, if any, should be extracted from a