You are on page 1of 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220539449

DARES: Documents annotation and recombining system - Application to the


European law.

Article in Artificial Intelligence and Law · June 2007


DOI: 10.1007/s10506-007-9031-7 · Source: DBLP

CITATIONS READS

3 172

2 authors:

Fady Farah Francois Rousselot

5 PUBLICATIONS 8 CITATIONS
University of Strasbourg
120 PUBLICATIONS 1,057 CITATIONS
SEE PROFILE
SEE PROFILE

All content following this page was uploaded by Francois Rousselot on 16 May 2014.

The user has requested enhancement of the downloaded file.


Artificial Intelligence and Law (2007) 15:83–102  Springer 2007
DOI 10.1007/s10506-007-9031-7

DARES: Documents annotation and recombining


system—Application to the European law

FADY FARAH and FRANÇOIS ROUSSELOT


Laboratoire du Ge´nie de la Conception (LGeCo) INSA Strasbourg, 24, Boulevard de la Victoire,
67084, Strasbourg Cedex, France
E-mail: fady.farah@insa-strasbourg.fr

Abstract. Accessing legislation via the Internet is more and more frequent. As a result, systems
that allow consultation of law texts are becoming more and more powerful. This paper presents
DARES, a generic system which can be adapted to any domain to handle documents production
needs. It is based on an annotation engine which allows obtaining XML documents inputs as
required by the system, and on an XML fragments recombining system. The latter operates using a
fragment manipulation functions toolbox to generate new documents. To validate this system, we
have tried to apply it to the domain of law through the consolidation problem.

Key words: annotation, user model, XML, documents fragments recombination, expert system

1. Introduction

As a proof of democracy, many countries have provided online access to law


during the last decade. Various legislative access systems have been designed
and concern thematic, as well as keyword or chronological access. In 1998,
over fifty countries (including USA, Australia, Norway, Germany, United
Kingdom, France, Spain, Turkey, India, Mexico, South Africa) had signifi-
cant collections of full texts available on the Internet.
The success of such systems has involved changes in the process of
documents production. Production of law documents is being normalized to
fit archiving, automation and consultation needs as we can see it in the work
performed at the sixth conference of Law via the Internet held in November
2004 in Paris (Sixth conference for Law via and November 2004). New Web
standards like XML and XSLT, by abstracting the document (separation
between the content structure and presentation), contribute to the person-
alization of production and thus to the separation between initial production
and ‘‘on request’’ automatic production.
84 FADY FARAH AND FRANÇOIS ROUSSELOT

Many systems have focused on those possibilities to address the consoli-


dation problem in the domain of law. In many countries, the most important
part of law texts is constituted by amendments to existing texts. Although
consolidated legislations do not have the same legal acceptation in different
countries (informative versus constitutive), systems offering automatic han-
dling of lawÕs evolutions by producing consolidated documents are very useful.
For instance, EnAct (Arnold-Moore and Clemes 2000) is a Tasmanian
legislation access system which handles such consolidations by helping
drafters to create amendments to laws (initial production) in a simple way
and to apply those amendments, when requested, to the concerned law (‘‘on
request’’ production). In Belgium, the Agora-Lex (Logghe et al. 2000) project
developed a model and prototype for managing historical versions of legis-
lation. In Europe, Eur-Lex (Berteloot 2004) is a Community law texts
database allowing multiple kinds of access and presentation of documents. It
also offers tools for navigation of manually consolidated documents and
support twenty European languages.
Our work is a part of an Reseau National des Technologies Logicielles
(RNTL) project named PAPLOO, which is collaboration between a software
editor named Ever Team that maintains the Eur-Lex database and access,
and two computer science laboratories, INRIA Lorraine France (LORIA)
and LGeCo. The purpose is to design and implement a generic system for
storing, indexing, accessing and transforming documents to create new
documents that fit usersÕ needs (Rousselot et al. 2005). In this paper, we give
an overview of the project and present our Documents Annotation and
Recombining System (DARES) and explain its applications to European law
via text consolidation.
The first section presents the project and gives some details about our
work. The second part is a study of European law texts coming from Eur-Lex
database. This section is followed by a description of the application and the
deployment of DARES in consolidation.

2. PAPLOO project overview

2.1. PAPLOO GENERAL ARCHITECTURE

This section gives a brief description of the complete system in order to


precise the context of our work. It will not be developed in this document.
The system first addresses the data preparation (left side of Figure 1)
which takes into account conversion of paper and image (via OCR/ICR) or
numeric documents into XML, XML documents annotation and indexing
for multi criteria future access. Here, annotation includes XML document
tagging and Meta data adding. In the remaining part of this paper, we will
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 85

Figure 1. PAPLOO system overview.

use the term ‘‘annotation’’ as synonym for document tagging i.e., logical and
semantic tags adding.
The second part (right of Figure 1) is the file module that works exclu-
sively on the XML documents resulting from data preparation. This part
concerns the access and the classification of the texts resulting from keywords
request, and the extraction of parts from those resulting documents to create
new documents that best fit usersÕ needs.
The part labeled ‘‘FileÕs creation’’ and the tagging part of ‘‘Extraction of
K’’ in Figure 1 are the subject of our system.

2.2. DARES ARCHITECTURE

DARES focuses on the (logical and semantic) annotation and manipulation


of XML document parts (or fragments) to produce new documents
according to the user requests. It uses the XML format to enable easy
manipulations of document parts. The results are XML documents associ-
ated to XSL sheets to produce (X)Html, PDF or Text documents. The
architecture is described by Figure 2.
The system is based on two independent modules:

2.2.1. Annotation module

The purpose of this module (Figure 2 (1)) is to provide tools to transform


XML or TXT to valid XML documents according to a DTD in a first step
and to identify relations useful for recombination in a second step. The
named steps introduce two levels of annotation. The first step refers to logical
86 FADY FARAH AND FRANÇOIS ROUSSELOT

Figure 2. Annotation (1) and Recombination (2) overview.

annotation and is about identifying and tagging the interesting fragments


while the second step refers to semantic annotation which consists in iden-
tifying some relations between fragments.

2.2.1.1. Logical annotation. The logical annotation part of the module, in


addition to the documents is fed by annotation rules ‘hand craftedÕ by an
expert. The annotation is thus not automatic but supervised. One reason is that
we cannot make any domain specific hypothesis for annotation; another reason
is that experts need to control and validate the annotation and will refine their
tagging rules as a user refines his requests throughout his search engine expe-
rience. Once rules are defined, it is possible to tag a document base efficiently.
The rules rely on regular expressions combinations associated to tagging
actions. The use of regular expressions is tied to the idea that structures and
relations can be identified by speech markers. Then, by combining markers,
we should be able to delimit zones corresponding to structures, and to find
indicators for relations.
The annotation module consists in a language for regular expressions
combination. It is used by an expert to design annotation rules. The grammar
of the combination language is given in Extended Backus Naur Form
(EBNF) form.
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 87

COMBINATION := REGEXP CONSTRAINT* | (‘[‘ | Õ]Õ) REGEXP REGEXP (‘[‘ | Õ]Õ)

CONSTRAINT := (‘andÕ) (‘notÕ) ? REGEXP CONTEXT


CONTEXT := ‘inÕ INTERVAL ‘ofÕ UNIT
INTERVAL := ‘[‘ n m ‘]Õ where n, m  Z and n £ m
UNIT := ‘wordÕ | ‘lineÕ | ‘sentenceÕ | ‘paragraphÕ | ‘textÕ | REGEXP
REGEXP :=a regular expression between (and)

This grammar allows describing an entity to tag provided that constraints


on its context are met i.e., other expression(s) are found in a specified context.
It also allows tagging a zone giving a pair of regular expressions where the
first one specifies the zone beginning and the second one the zone ending.
Logical annotation can be performed using this tool as we can see it in the
examples below.

Ex: Combinations for European Commission documents


(1) The annotation expression [(Article[ ]+[1-9][0-9]*)(Article | Done at)[
will find zones starting with the keyword ‘‘Article’’ followed by one or
more white spaces characters, and by a number. The zone found is
delimited by the keyword ‘‘Article’’ identifying the following article or
by the utterance ‘‘Done at’’ indicating the end of documents. The
opened ending bracket indicates that the delimiting expression is not
included in the zone.
(2) The annotation expression(Article) and ([1–9][0–9]*) in [1 2] of word will
find zones starting with the keyword ‘‘Article’’ and having a number in
the two following words as specified by the interval [1 2].

2.2.1.2. Semantic annotation. This part of the annotation module aims at


extracting and tagging generic relations between fragments. The problem
here is to determine a set of domain independent relations that can appear
between fragments and then to define methods to tag those relations in XML
documents. To deal with this vast problem, we have chosen to start with one
type of relation which appears in documents of many domains. This relation
is the update relation which is known in the law domain as amending.
A generic update relation can be generally formalized as a triple (modifier,
target, action) where:

– Modifier is the new fragment which modifies the target. In case of deletion
of the target fragment, no modifier is present.
– Target is the fragment to modify
– Action is the description of the modification; this modification may be an
insertion/adding of a modifier into the target fragment, a deletion of a sub
88 FADY FARAH AND FRANÇOIS ROUSSELOT

fragment or of the entire target fragment, a replacement of the target


fragment or one of its sub fragments by a modifier.
To annotate the relations, the system builds relations triples. To fill the
modifier and the target fields, it uses regular expressions representing parts or
fragments addresses in the XML document. They are automatically syn-
thesized by the system using the DTD given by an expert. Those expressions
are the possible ways to make references to structures:
‘‘((Ordinal Structure)| (Structure Number))((‘,Õ)?((Ordinal Structure) |
(Structure Number))*)’’
where:
– Ordinal is another regular expression locating (literals) ordinals and
keywords like ‘previousÕ, ‘followingÕ, ‘nextÕ, ‘lastÕ;
– Structure is a regular expression which locates all the tag names
available in the DTD;
– Number is a regular expression representing (literals) numbers

Ex: ‘Article 1, Paragraph 2Õ, ‘Third paragraphÕ ...


A reference or address of a structure, as it is hierarchical can be bust in the
document. As the constituting elements generally appear in a natural order
(higher level structure first), the system is able to compile them to build a
complete address which it may resolve later.
Ex: build the address ‘‘Article 2, second Paragraph’’ from burst reference
‘‘In Document (EC) 1553/2005, Article 2 is modified as follow:
- The second Paragraph is deleted...’’

Finally, to fill the action field, we use another regular expression built by
the expert. This expression determines the different ways of expressing the
kind of actions. We will see an example in the section detailing the appli-
cation to European law texts.
This annotation tool leads to XML documents enriched with relevant
semantic information. The annotation is language dependent, thus each
language will have a different annotation set of rules.

2.2.2. Recombination module

This module is the basis for XML documents transformations. It consists in


two parts:
The transformation toolbox (bottom right in Figure 2 (2)) contains a
complete set of primitive and generic operations on fragments. Those oper-
ations named basic tasks come from a decomposition of the document cre-
ation process inspired by works of Hayes and Flowers in the cognitive
sciences field: information extraction and filtering, edition and composition,
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 89

formatting and archiving. We have built a typology of basic document


transformation tasks and designed a simple but powerful language to com-
bine them to obtain complex republication tasks. This language handles basic
or complex tasks calls and controls like iterations and conditions in a simple
way and permits to describe a document republication that corresponds to
the user needs and domain.
The central element of the recombination part is a rule-based system
(center of Figure 2 (2)). It is responsible for choosing a predefined complex
task and parameterizes this task to produce documents that best fit a
particular userÕs need. Some works like CruzelÕs approach (Laine-Cruzel and
Guinet 2000) use an association matrix for this purpose. The problem of
those approaches resides in the fact that they consider a predefined set of
complex tasks which do not evolve and that they consider only static asso-
ciations. In our approach, association rules between the user representation,
complex tasks and parameters can be specified dynamically for each domain.
Another advantage is the separation between complex tasks definition and
rules using those tasks with specific parameters. This approach guarantees
the adaptability of the system to a particular domain. The weakness is that
the set of rules modeling the domain and the usersÕ needs in terms of docu-
ment generation has to be defined by a domain expert.
To better answer to a user demand, the system must choose adequate
documents production tasks with parameters in accordance with the user job,
level of expertise of the domain, interesting topics... In other words, a user
point of view is necessary. Thus, a user model system (bottom of Figure 2)
handles the different user habits or static needs expressed by the user model. It
contains information on user like job, domain experience, presentation pref-
erences, interesting topic, system expertise level... Provided this model and a
user request through the graphical interface, the system can choose the best set
of rules and parameters to satisfy the request. This part is not totally functional
yet. We only use the job field which leads to a list of possible tasks (through an
association rule) and the system expertise level that decides whether or not the
user is authorized to feed the rule based system with new production rules.
The user can express his interest by a request as in classic search engines,
and chooses one high level task among the possible tasks corresponding to
his profile. The activated rules are then launched and result in the creation of
document(s). This will be illustrated further.

2.3. ADAPTATION TO EUROPEAN LAW FOR CONSOLIDATED LEGISLATION

As said before, our application domain is the European law. To validate the
adaptability of our system, we have tried to use it to handle a quite complex
law process known as consolidation.
90 FADY FARAH AND FRANÇOIS ROUSSELOT

Existing systems with consolidations features are generally specific to the


law domain and even sometimes specific to consolidations. What we propose
here is a generic system that can be adapted to any domain by designing
document production rules (using a transformation operations toolbox). This
system acts on ‘‘on request’’ production. Thus, it supposes that thereÕs an
existing XML documents base (initial production already done) with a
known DTD eventually annotated by our annotation module.
There is a major difference between systems like EnAct, which generates
amendments automatically (initial production) using initial document
annotated by experts (annotations representing modifications to be done)
and the system presented here where the amendments are completely written
by experts (no initial production). Figure 3 and 4 present EnAct system
initial document production.

Figure 3. Principal act to use to generate amendments (extracted from Arnold-Moore and
Clemes (2000)).

Figure 4. Principal act markup by expert to specify amendments, this leads to the
generation of an amendments description document (extracted from Arnold-Moore and
Clemes (2000)).
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 91

Figure 5. Use of the amendments description document to consolidate the principal act
(extracted from Arnold-Moore and Clemes (2000)).

EnAct, like our system also handles the ‘‘on request’’ production of
consolidated documents (Figure 5). In our approach, producing such con-
solidated document needs to analyze the amendments to detect the modifi-
cations to be done on initial law text. This step is not necessary in EnAct
approach where the amendments were generated by the system.
Adapting our system to any domain represents a 6 steps process:

Step 1: writing the DTD of the domain documents and setting the
structures relations implied by the tag semantic. The XML elements (frag-
ments) identified in the DTD are the targets of the transformation opera-
tions.
Step 2: logical annotation to ensure that the DTD is met. It consists in the
tagging of logical parts.
Step 3:1 semantic annotation to identify information like relations
between documents parts, expressions of modifications in case of amend-
ments. Here, the system tags texts or parts indicating a relation between a
fragment and another one. Those relations are implied by the tag content and
cannot be obtained while defining the DTD. The only relation treated by now
is the update relation (is replaced, is inserted, is deleted, is added).
Step 4: Meta data like date, place, and author ... tagging if necessary.
Step 5: adapting profile acquisition and evolution predefined rules and
adding some if necessary.
Step 6: writing cascaded rules for document production using predefined
complex tasks and best parameters for the tasks. The left hand side of the
first rule that launches subsequent cascaded rules is the goal. This goal is
associated to a job while the rule is defined. According to his job, the user will
have access to particular set or goals.

The system is then ready to receive user requests. Users choose a goal and
if necessary for the realization of this goal, enter keywords. Users considered
as system experts will be authorized to define new complex tasks and new
92 FADY FARAH AND FRANÇOIS ROUSSELOT

rules but this latter functionality is not available yet. The remaining part of
this paper presents a study of the application input documents, followed by a
section describing logical and semantic annotation, production of consoli-
dated documents (rules and tasks used) and results.

3. Basic study of the structure of European law documents

To satisfy the steps 1 to 4 of the domain adaptation process for law docu-
ments, we have to study the logical structure of the manipulated documents
and to formalize expressions of update relations or amendments in those
documents. That is the topic of this section. Our study is based on hundreds
of law documents extracted from Official Journals via the Eur-Lex system,
and on documents describing writing techniques.

3.1. DOCUMENTS IN OFFICIAL JOURNAL

The Community Official Journal texts, which constitute our working base,
are made of many types of texts grouped in two main categories: legislation,
information and notices. In this paper, we focus on regulations, directives,
decisions and recommendations regardless of their category. The nature
(regulation...) of document is generally indicated in the title as we can see it in
the following Figure 6.

3.2. STRUCTURE OF THE DOCUMENTS AND REGULARITY

Community Official Journal documents grammar is not directly accessible


through Eur-Lex. But we observe a normalization (standardization) tendency
in documents production that helped us in designing the document grammar.

Figure 6. An example of regulation.


DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 93

For instance, on Eur-Lex, we can find recommendations and legislative


techniques for drafters (‘‘Joint practical guide: Guide of the European
XXXX). The structure proposed below is a synthesis of readings of those
recommendations associated to rules inferred from the regularity observed in
the documents we have studied. It should not be considered as a reference or
a final grammar as it is empirical and perhaps ambiguous. But this grammar
is sufficient for our current need.
Community acts are generally drafted according to a standard structure:

– Title: identifies the act as it is composed of the indication of the type of


act, the abbreviation of the Community concerned, the number of the
act and the name of the institution or institutions which adopted the
act, the date of adoption, and a succinct indication of the subject
matter.
– Preamble: is placed between the title and the enacting terms and contains
citations and recitals. The preamble starts with an institution name like
‘‘The European Parliament’’ or ‘‘The Commission of’’. It ends with an
expression introducing the enacting terms which is usually ‘‘Have adopted
this:’’
• Citations: at the beginning of the preamble, they indicate the legal basis
of the act, the proposals, recommendations, initiatives, drafts... that
must be obtained, and certain opinions and other non-mandatory
procedural steps. Citations are generally introduced by the dedicated
expression ‘Having regard toÕor ‘Acting in accordance withÕ
• Recitals: are the parts of the act containing the statement of reasons for
the act; they are placed between the citations and the enacting terms.
Recitals are introduced by the word ‘Whereas:Õ and continue with
numbered points comprising one or more complete sentences. Each
point starts by a number between parentheses like (1), except for a sole
recital.
– Enacting terms: are the legislative parts of the act. They are composed of
articles and points, which may be grouped in titles, chapters and sections.
An unambiguous heading or symbol identifies each structure. Figure 7
associates each possible sub structure to its symbol. Each of these
structures ends when another starts. The last structure of the enacting
terms ends when the enacting terms part itself ends. Enacting terms start
directly after preamble and end with the utterance ‘‘Done in ...’’ which
indicates the place and date of edition of the act.
– Annex: where necessary, begins by the heading ‘‘annex’’ and is spread out
until the end of document. In case where many annexes are necessary, each
annex has a heading like the one cited above and is numbered.
94 FADY FARAH AND FRANÇOIS ROUSSELOT

Figure 7. Symbols introducing structures extracted from legislatives techniques.

According to the regularity of the texts and the indicators (utterances


introducing parts) of the structures, logical tagging is possible using our
annotation tool.

3.3. AMENDMENTS OR EXPRESSIONS OF REVISIONS

In this part we focus on amendments expressions. Amendment documents


like any other document are normalized. The legislative techniques of
European publication office give some precise recommendations for
amendments writing.
Amendments are not independent documents but exist with the only aim
of modifying an act. Thus, they do not contain new substantive provisions
which are autonomous in relation to the act being amended and they cannot
be amended.
In general, the amending acts have the same type as the initial act. In
particular, it is not frequent to amend a regulation by means of a directive.
However, certain provisions of primary legislation leave the choice of the
type of act to the institutions, by granting them power to adopt ‘measuresÕ or
by expressly mentioning several possible types of act.
The title of the amending act must mention the number of the act being
amended and either indicate the title of the act, or specify what is to be
amended. The amended acts are introduced in the title by the expression
‘‘amending’’ and the connector ‘‘and’’ if thereÕs more than one modified act.

Ex:
Initial act title:‘‘Council Regulation (EC) No A of ... on improving the
efficiency of agricultural structures’’
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 95

Amending act title: ‘Council Regulation (EC) No B of ... amending


Regulation (EC) No A on improving the efficiency of agricultural
structures.Õ
Where several provisions of the same act are to be amended, all the
amendments are combined in a single article, comprising an introductory
sentence and points following the numerical order of the articles to be
amended. If several acts are amended by a single amending act, the
amendments to each act should be set out together in a separate article.
Modifications of acts are generally texts to insert in the act to be
amended. They target preferably complete units of text (an article or a
subdivision of an article) than individual sentences or words, in the
interests of clarity and in view of the problems of translation into all the
official languages. Amendments are done via a formal or textual amend-
ment. We can find four types of formal modifications which correspond to
the kind of actions in our relation triples: deletion, insertion, replacement
and adding
The utterance ‘‘is amended as follows’’introduces a list of formal modifi-
cations.

Ex: article of an amending act with the four types of modification


Article 1
Regulation (EC) No .../...is amended as follows:
1. 1) Article1 is deleted
2. 2) Article 23 is amended as follows:
a) In paragraph 1, point (f) is deleted;
b) Paragraph 2 is replaced by the following: ‘‘2. Member States may
prescribe that...’’
c) The following paragraph 2a is inserted : ‘‘2a. In the case of provid-
ers of statistical information ...’’
d) The following paragraph 4 is added: ‘‘4. The Commission shall en-
sure publication in the Official Journal ...’’
Parts like Articles, paragraphs or points are generally not renumbered,
because of the potential problems of references in other acts. Likewise,
blanks left by the deletion of articles or other numbered parts of the text are
not subsequently filled by other provisions, except when the content is
identical to the text deleted.
Based on our observations of amending documents, we defined an
empiric grammar of the update relation for law texts. It is presented in
Table I.
For more information on amendments we recommend to consult the
CouncilÕs Manual of Precedents, the CommissionÕs Manual on Legislative
Drafting and LegisWrite2
96 FADY FARAH AND FRANÇOIS ROUSSELOT

Table I. Amendments expression grammar in EBNF

Type of update Grammar rule

Deletion Deletion := Part (‘shallÕ ‘beÕ| ‘isÕ) ‘deletedÕ


Adding Adding := Part (‘shallÕ ‘beÕ | ÕisÕ) ‘addedÕ
Insertion Insertion :=
(‘inÕ Part (‘,Õ)? Part (‘shallÕ ‘beÕ| ‘isÕ) ‘insertedÕ ((‘beforeÕ | ‘afterÕ) Part)?) |
(Part (‘shallÕ ‘beÕ| ‘isÕ) ‘insertedÕ (‘inÕ Part) ((‘beforeÕ | ‘afterÕ) Part)?)
Replacement Replacement := Part (‘shallÕ ‘beÕ| ‘isÕ) ‘replacedÕ ‘byÕ Part
Part Corresponds to the structures identification. A regular expression is
generated by the system according to the DTD given.
This expression has been described in the semantic annotation section.

4. Automation of consolidated legislation

4.1. ANNOTATION OF LAW TEXTS

The annotation step assumes that a DTD has been designed and provided to
the system.

4.1.1. Logical annotation

Logical annotation is performed when the documents are unstructured (like


text document) or when the granularity level of the structure is not sufficient.
It is possible to use our annotation tool because application texts are nor-
malized. Here are the non formal rules used for logically tagging the docu-
ments in English (the rules are also available for French):

– Tagging the title: if the first text element encountered in the document
contains all the constitutive elements of the title (see previous section for a
description of those elements) and the preamble start markup is found then
tag the title.
– Preamble tagging: if the title has been tagged and the start preamble mark
has been found and the enacting terms start markup is found then tag the
preamble.
– Enacting terms tagging: After the preamble tagging, if the enacting terms
start markup has been found or a sub structure heading (article, point...)
has been found, and if the enacting terms ending wording is found, then
tag the terms.
– Annex tagging: if the annex markup is found and if another annex start
markup is found or if the document ends, tag the annex.
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 97

4.1.2. Semantic annotation

According to the amendments empirical grammar presented in the previous


part and regarding the possibilities of our generic semantic annotation tool,
we concluded that this annotation module is efficient in the context of
European law texts amendments.
The domain expert only has to feed the annotation tool with a regular
expression covering different ways to express modifications. This regular
expression is: (shall [ ]+ be | is) (deleted | replaced [ ]+ by | inserted | added)

4.2. PRODUCTION OF CONSOLIDATED DOCUMENT ON REQUEST: FIRST


RESULTS

In this section, we explain the treatment performed on the annotated docu-


ments to generate a consolidated document. WeÕll illustrate it on documents
from Eur-Lex database.
To simplify, letÕs suppose that:

– The user needs to obtain a consolidated document about ‘‘spongiform


encephalopathies’’
– Only one document talks about ‘‘spongiform encephalopathies’’. This
document is named ‘‘Council Decision of 4 December 2000 concerning
certain protection measures with regard to transmissible spongiform
encephalopathies and the feeding of animal protein’’
– This document has at least one amending act named ‘‘COMMISSION
DECISION of 27 March 2002 amending Council Decision 2000/766/EC and
Commission Decision 2001/9/EC with regard to transmissible spongiform
encephalopathies and the feeding of animal proteins’’
– The two documents are annotated like illustrated in Figure 8.

In the Figure 8, the amendment is enriched with semantic tags (Action


tags) indicating the action to perform and the target of this action. The left
verticals lines of same color make an association between the textual
amending fragment and the XML one.
In Figure 9, the vertical lines of the same color associate the XML frag-
ment from the initial act and the XML fragment modifying (coming from
amending document).
To generate the consolidated document based on those two acts, cascaded
rules must have been written and inserted in the system during the system
feed step. The first rule left side is the goal. LetÕs define the goal ‘‘Make-
consolidation-on-topic’’. The rules are:
98 FADY FARAH AND FRANÇOIS ROUSSELOT

Figure 8. Logical and semantic tagging of the articles of the amending document
‘‘COMMISSION DECISION of 27 March 2002 amending Council Decision 2000/766/EC
and Commission Decision 2001/9/EC with regard to transmissible spongiform encephalopa-
thies and the feeding of animal proteins’’.

Figure 9. Annotated initial and amending acts (XML).

(1) (Make-consolidation-on-topic <user-keywords><format>)



(Extract-fragments-for-consolidation <user-keywords>) then(Merge-
extracted) then(Format-extracted <format>)
The rule (1) is activated by the fact (Make-consolidation-on-topic
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 99

<user-keywords> <format>) introduced when a user chooses the


consolidation goal proposed by the system, fills the keywords field of
the interface and indicates his preferred format. This rule leads to the
insertion of three new facts in the knowledge base. Those facts acti-
vate the rules (2), (4) and (5). The connector then is used to specify
that action are sequenced and do not occur at the same time.
(2) (Extract-fragments-for-consolidation <user-keywords>)

(Initial-docsextract (and (unit ‘‘Document’’)(content (and (unit ‘‘Title’’)
(content <keywords>)))))then(Extract-amending-fragments)

The rule (2) uses the keywords to extract fragments which are here
whole documents. Initial acts, then amending acts are extracted. Here,
we notice a call to our transformation toolbox task (extract(...)). The
extraction task is responsible of getting fragments and takes a
criterion as parameter. This criterion is a combination of criteria. It is
expressed in prefix way. The tasks are written in red. The result of the
call to extract(...) allows to introduce a new fact called Initial-docs.
(3) (Extract-amending-fragments) and (Initial-docs <docs>)

(Amendment-docs (for-each <docs> extract (and (unit ‘‘Document’’)
(relation ‘‘modify’’ <docs>))))

Rule (3) is activated when initial documents have been extracted. It


allows to extract corresponding amendments which are also fragments
corresponding to whole documents. The connector and or or can be
used to specify conditions of rules.
(4) (Merge-extracted) and (Initial-docs <docs1>) and (Amendment-docs
<docs2>)

(Result-docsmerge (<docs1>,<docs2>))

Rule (4) activation conditions are inserted in the knowledge base of


the by the rules (1), (2) and (3). The merge task is called here. We do
not detail this task for clarity. The merge task takes documents and
their amendments and applies those amendments producing new doc-
uments. This task uses sub tasks like replacements of fragments (re-
place (...) task of the toolbox) to make the merging. It also adds
index in the merge document on replacements and insertions...
Figure 10 shows this index (top of the schema).
(5) (Format-extracted <format>) and (Results-docs <docs>)
100 FADY FARAH AND FRANÇOIS ROUSSELOT

Figure 10. A possible result of consolidation.


(for-each <docs> format(<format>,<docs>))

The last rule takes the resulting fragments which are actually docu-
ments and saves them using a specified format (html or pdf). A de-
fault style sheet is used for this purpose. This style sheet will in the
future be generated for each user according to his profile preferences.
Figure 10 illustrates a consolidated document in html format resulting
from those rules. At the top of the figure, an index of the revision allows to
access to the modified parts through links. The title of this amended docu-
ment is highlighted in blue. A zoom on Article 2 shows the replacements done
by the merging task. Old texts are in yellow and new ones are highlighted in
green.
DOCUMENTS ANNOTATION AND RECOMBINING SYSTEM 101

This result shows one possibility of generation and presentation of con-


solidated documents. As the rules and tasks involved can be designed dif-
ferently, a lot of presentations are possible.

5. Discussions and perspectives

This paper presents a system implemented within the context of PAPLOO


project. The system can be adapted to any domain to satisfy need for
documents generation. But this generalization property introduces a
complexity in deploying DARES on a specific domain. In particular, to
adapt the system to a specific domain, an expert has to define a DTD
covering all the domain documents, to craft expressions if the documents
have to be logically and semantically annotated, to design complex
republication tasks and to define rules to associate tasks, user represen-
tation and parameters.
Concerning the annotation tools, they are based on regular expression
because we made the hypothesis that structures and relations are denoted by
speech markers. Thus, this tool is not usable for domains where documents
logical and semantic structures are not explicit. We will try to deal with this
weakness in future works.
In this document, we showed how to adapt the system to European law
domain and proposed, via the definition of a production rules cascade, a
possible presentation for consolidated documents. It is important to remind
that any other presentation can be realized by defining a proper rules cascade
and specific complex tasks. This work has not been showed yet to Lawyers
for validation because its first goal was to show the adaptability and possible
applications of our system. We will discuss with lawyers in a further step in
order to get feed backs about the reliability.
In this document, we briefly mentioned the user model. The aim of this
model is to have a better comprehension of the user needs. The only parts
currently used by the system are the job and the system level expertise.
According to his job, a user will have access to a set of goals. If this user is a
system expert, he will be authorized to design new goals. In the future, the
user model will be completely functional.
Addressing the crossed translation problem is also envisaged: we will offer
a help to translators by allowing the generation of a document where cor-
responding fragments of different languages are put in parallel. This docu-
ment will be created as a result of a request specifying keywords and working
languages. This kind of presentation will be useful to translate new texts
reusing the terminology.
102 FADY FARAH AND FRANÇOIS ROUSSELOT

Notes
1
Order of steps 3 and 4 can be inverted.
2
http://europa.eu.int/comm/dgs/translation/workingwithus/freelance/guide/index_en.htmar.

References

Arnold-Moore, T. and Clemes, J. (2000) ‘‘Connected to the Law: Tasmanian Legislation


Using EnAct’’, Journal of Information Law and Technology.
Berteloot, P. ‘‘EUR-Lex 2004’’ , Sixth conference for Law via Internet, November 2004, Paris.
‘‘Joint practical guide: Guide of the European Parliament, the Council and the Commission’’,
http://europa.eu.int/eur-lex/lex/en/techleg/1.htm.
Laine-Cruzel, S. and Guinet, E. (2000). Fragmentation et enrichissement de textes scientifiques
sous forme électronique, 4(1–2): 59–84.
Logghe, M., van de Kerchove, K. and Moens, M. F. (2000) ‘‘Automatic Version Management
of Legislation: The Agora-Lex Project’’ 11th International Workshop on Database and
Expert Systems Applications (DEXAÕ00) p. 1051.
Rousselot, F. and Farah, F. et al. (2005)‘‘Document retro-conversion for personalized elec-
tronic reedition’’, International Workshop on Document Analysis (IWDA 2005).
Sixth conference for Law via Internet, November 2004, Paris http://www.frlii.org/.

View publication stats

You might also like