Computational Linguistics

Computational Linguistics

Nov 17, 2012
Extraction of factual data from texts is the task of automatic genera-
tion of elements of a factographic database, such as fields, or pa-
rameters, based on on-line texts. Often the flows of the current news
from the Internet or from an information agency are used as the
source of information for such systems, and the parameters of inter-
est can be the demand for a specific type of a product in various
regions, the prices of specific types of products, events involving a
particular person or company, opinions about a specific issue or a
political party, etc.
The decision-making officials in business and politics are usually
too busy to read and comprehend all the relevant news in their
available time, so that they often have to hire many news summariz-
ers and readers or even to address to a special information agency.
This is very expensive, and even in this case the important relation-
ships between the facts may be lost, since each news summarizer
typically has very limited knowledge of the subject matter. A fully
effective automatic system could not only extract the relevant facts
much faster, but also combine them, classify them, and investigate
their interrelationships.
There are several laboratory systems of that type for business ap-
plications, e.g., a system that helps to explore news on Dow Jones
index, investments, and company merge and acquisition projects.



Due to the great difficulties of this task, only very large commercial
corporations can afford nowadays the research on the factual data
extraction problem, or merely buy the results of such research.
This kind of problem is also interesting from the scientific and
technical point of view. It remains very topical, and its solution is
still to be found in the future. We are not aware of any such re-
search in the world targeted to the Spanish language so far.

