Professional Documents
Culture Documents
net/publication/220539449
CITATIONS READS
3 172
2 authors:
5 PUBLICATIONS 8 CITATIONS
University of Strasbourg
120 PUBLICATIONS 1,057 CITATIONS
SEE PROFILE
SEE PROFILE
All content following this page was uploaded by Francois Rousselot on 16 May 2014.
Abstract. Accessing legislation via the Internet is more and more frequent. As a result, systems
that allow consultation of law texts are becoming more and more powerful. This paper presents
DARES, a generic system which can be adapted to any domain to handle documents production
needs. It is based on an annotation engine which allows obtaining XML documents inputs as
required by the system, and on an XML fragments recombining system. The latter operates using a
fragment manipulation functions toolbox to generate new documents. To validate this system, we
have tried to apply it to the domain of law through the consolidation problem.
Key words: annotation, user model, XML, documents fragments recombination, expert system
1. Introduction
use the term ‘‘annotation’’ as synonym for document tagging i.e., logical and
semantic tags adding.
The second part (right of Figure 1) is the file module that works exclu-
sively on the XML documents resulting from data preparation. This part
concerns the access and the classification of the texts resulting from keywords
request, and the extraction of parts from those resulting documents to create
new documents that best fit usersÕ needs.
The part labeled ‘‘FileÕs creation’’ and the tagging part of ‘‘Extraction of
K’’ in Figure 1 are the subject of our system.
– Modifier is the new fragment which modifies the target. In case of deletion
of the target fragment, no modifier is present.
– Target is the fragment to modify
– Action is the description of the modification; this modification may be an
insertion/adding of a modifier into the target fragment, a deletion of a sub
88 FADY FARAH AND FRANÇOIS ROUSSELOT
Finally, to fill the action field, we use another regular expression built by
the expert. This expression determines the different ways of expressing the
kind of actions. We will see an example in the section detailing the appli-
cation to European law texts.
This annotation tool leads to XML documents enriched with relevant
semantic information. The annotation is language dependent, thus each
language will have a different annotation set of rules.
As said before, our application domain is the European law. To validate the
adaptability of our system, we have tried to use it to handle a quite complex
law process known as consolidation.
90 FADY FARAH AND FRANÇOIS ROUSSELOT
Figure 3. Principal act to use to generate amendments (extracted from Arnold-Moore and
Clemes (2000)).
Figure 4. Principal act markup by expert to specify amendments, this leads to the
generation of an amendments description document (extracted from Arnold-Moore and
Clemes (2000)).
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 91
Figure 5. Use of the amendments description document to consolidate the principal act
(extracted from Arnold-Moore and Clemes (2000)).
EnAct, like our system also handles the ‘‘on request’’ production of
consolidated documents (Figure 5). In our approach, producing such con-
solidated document needs to analyze the amendments to detect the modifi-
cations to be done on initial law text. This step is not necessary in EnAct
approach where the amendments were generated by the system.
Adapting our system to any domain represents a 6 steps process:
Step 1: writing the DTD of the domain documents and setting the
structures relations implied by the tag semantic. The XML elements (frag-
ments) identified in the DTD are the targets of the transformation opera-
tions.
Step 2: logical annotation to ensure that the DTD is met. It consists in the
tagging of logical parts.
Step 3:1 semantic annotation to identify information like relations
between documents parts, expressions of modifications in case of amend-
ments. Here, the system tags texts or parts indicating a relation between a
fragment and another one. Those relations are implied by the tag content and
cannot be obtained while defining the DTD. The only relation treated by now
is the update relation (is replaced, is inserted, is deleted, is added).
Step 4: Meta data like date, place, and author ... tagging if necessary.
Step 5: adapting profile acquisition and evolution predefined rules and
adding some if necessary.
Step 6: writing cascaded rules for document production using predefined
complex tasks and best parameters for the tasks. The left hand side of the
first rule that launches subsequent cascaded rules is the goal. This goal is
associated to a job while the rule is defined. According to his job, the user will
have access to particular set or goals.
The system is then ready to receive user requests. Users choose a goal and
if necessary for the realization of this goal, enter keywords. Users considered
as system experts will be authorized to define new complex tasks and new
92 FADY FARAH AND FRANÇOIS ROUSSELOT
rules but this latter functionality is not available yet. The remaining part of
this paper presents a study of the application input documents, followed by a
section describing logical and semantic annotation, production of consoli-
dated documents (rules and tasks used) and results.
To satisfy the steps 1 to 4 of the domain adaptation process for law docu-
ments, we have to study the logical structure of the manipulated documents
and to formalize expressions of update relations or amendments in those
documents. That is the topic of this section. Our study is based on hundreds
of law documents extracted from Official Journals via the Eur-Lex system,
and on documents describing writing techniques.
The Community Official Journal texts, which constitute our working base,
are made of many types of texts grouped in two main categories: legislation,
information and notices. In this paper, we focus on regulations, directives,
decisions and recommendations regardless of their category. The nature
(regulation...) of document is generally indicated in the title as we can see it in
the following Figure 6.
Ex:
Initial act title:‘‘Council Regulation (EC) No A of ... on improving the
efficiency of agricultural structures’’
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 95
The annotation step assumes that a DTD has been designed and provided to
the system.
– Tagging the title: if the first text element encountered in the document
contains all the constitutive elements of the title (see previous section for a
description of those elements) and the preamble start markup is found then
tag the title.
– Preamble tagging: if the title has been tagged and the start preamble mark
has been found and the enacting terms start markup is found then tag the
preamble.
– Enacting terms tagging: After the preamble tagging, if the enacting terms
start markup has been found or a sub structure heading (article, point...)
has been found, and if the enacting terms ending wording is found, then
tag the terms.
– Annex tagging: if the annex markup is found and if another annex start
markup is found or if the document ends, tag the annex.
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 97
Figure 8. Logical and semantic tagging of the articles of the amending document
‘‘COMMISSION DECISION of 27 March 2002 amending Council Decision 2000/766/EC
and Commission Decision 2001/9/EC with regard to transmissible spongiform encephalopa-
thies and the feeding of animal proteins’’.
The rule (2) uses the keywords to extract fragments which are here
whole documents. Initial acts, then amending acts are extracted. Here,
we notice a call to our transformation toolbox task (extract(...)). The
extraction task is responsible of getting fragments and takes a
criterion as parameter. This criterion is a combination of criteria. It is
expressed in prefix way. The tasks are written in red. The result of the
call to extract(...) allows to introduce a new fact called Initial-docs.
(3) (Extract-amending-fragments) and (Initial-docs <docs>)
fi
(Amendment-docs (for-each <docs> extract (and (unit ‘‘Document’’)
(relation ‘‘modify’’ <docs>))))
fi
(for-each <docs> format(<format>,<docs>))
The last rule takes the resulting fragments which are actually docu-
ments and saves them using a specified format (html or pdf). A de-
fault style sheet is used for this purpose. This style sheet will in the
future be generated for each user according to his profile preferences.
Figure 10 illustrates a consolidated document in html format resulting
from those rules. At the top of the figure, an index of the revision allows to
access to the modified parts through links. The title of this amended docu-
ment is highlighted in blue. A zoom on Article 2 shows the replacements done
by the merging task. Old texts are in yellow and new ones are highlighted in
green.
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 101
Notes
1
Order of steps 3 and 4 can be inverted.
2
http://europa.eu.int/comm/dgs/translation/workingwithus/freelance/guide/index_en.htmar.
References