Document Information
237 Reads | 0 Comments
Description
In spite of the rise of Information Extraction and the development of many applications in the
last twenty years, this task encounters problems when it is carried out on atypical texts such as
oral communication notes.
Oral communication notes are texts which are the result of an oral communication (meeting,
talk, etc.) and they aim to synthesize the informative contents of the communication. These
constraints of drafting (speed and limited amount of writing) lead to linguistic characteristics
which the traditional methods of Natural Language Processing and Information Extraction are
badly adapted to. Although they are rich in information, they are not exploited by systems
which extract information from texts.
In this thesis, we propose an extraction method adapted to oral communication notes. This
method, called MEGET, is based on an ontology which depends on the information to be
extracted (“extraction ontology”). This ontology is obtained by the unification of an “ontology of
needs”, which describe the information to be found, with an “ontology of terms” which
conceptualize the terms of the corpus which are related to the required information. The
ontology of terms is elaborated from terminology extracted from texts and enriched by terms
found in specialized documents. The extraction ontology is formalized by a set of rules which are
provided as a knowledge base for the extraction system SYGET. This system (1) carries out a
labelling of each instance of every element of the extraction ontology and (2) extracts the
information. This approach is validated in several corpora
252 Pages