You are on page 1of 10

Plateformes, projets et équipes:

GATE (General Architecture for Texte Engineering): http://gate.ac.uk


Open source, 100% JAVA, In development since 1996, Over 50 plugins, over 70 resource types
STANDARDS-BASED Reference implementation in ISO TC37/SC4 LIRICS project; supports XCES, ACE, TREC etc.
formats; founder member of OASIS/UIMA committee.

Product Description
an integrated development environment for language processing components bundled with the most widely used
GATE Developer
Information Extraction system and a comprehensive set of other plugins
an object library optimised for inclusion in diverse applications giving access to all the services used by GATE
GATE Embedded
developer and more
a collaborative annotation environment for high volume factory-style semantic annotation projects built around a
GATE Teamware
workflow engine and the GATE cloud backend web services
(Multi-paradigm Information Management Index and Repository) a massively scaleable multiparadigm index
GATE Mímir supporting Ontotext KIM and built on Ontotext's semantic repository family, GATE's annotation structures
database plus full-text indexing from MG4J
a Controllable Wiki and CMS with collaborative and asynchronous off-line editing, hosting controlled languages for
GATE Wiki
round-trip ontology engineering
a parallel distributed processing engine that combines GATE embedded with a heavily optimised service
GATE Cloud
infrastructure running on supercomputer hardware
how to commission, design, develop, implement, maintain and evaluate robust and sustainable text processing
The GATE Process
workflows
Training and
intensive 1-week training courses, web training and GATE Certification
Certification
META-NET (Multilingual Europe Technology Alliance Network): http://www.meta-net.eu/
un Réseau d'Excellence comprenant 60 centres de recherche de 34 pays, est dédié à la construction des fondations
technologiques d'une société de l'information de l'Europe multilingue.
ILC (Istituto di Linguistica Computazionale) : http://www.ilc.cnr.it

Projet Objectifs Services


europeen Projects
CLARIN (Common Language The ultimate objective of CLARIN ERIC is to advance research in - Search in language
Resources Infrastructure) humanities and social sciences by giving researchers unified resources
(2008-2011) single sign-on access to a platform which integrates language- - Easy access to
is now entirely funded by the based resources and advanced tools at a European level. protected resources
participating countries The CLARIN vision is based on the following eight pillars: (members only)
Coverage, Legal issues, Integration of data, - LRT Inventory
http://www.clarin.eu/ Integration of services, Preservation, Ease of access,
Crossing borders:
CRISTAL (Conceptual Retrieval The CRISTAL project addresses the area of text retrieval and
of Information using Semantic indexing. The project will develop a multilingual (French, English
Dictionary in three Languages) and Italian) natural language interface in order to retrieve
From 1993 to 1996 monolingual (French) text in a corpus of newspaper articles.
http://cordis.europa.eu/ The system will integrate linguistic methods and information
projects/rcn/19607_en.html retrieval techniques.
DELIS (Descriptive Lexical DELIS is a multidisciplinary project with three broad objectives:-
Specifications and Tools for to contribute to a methodology of dictionary development
Corpus-based Lexicon based on corpus evidence; to produce parallel dictionary
Building) From 1993 to 1995 fragments in five languages, and to produce software tools
http://cordis.europa.eu/ supporting this king of lexicographic work.
projects/rcn/17207_en.html
EAGLES (Expert Advisory The basic idea behind EAGLES work is for the group to act as a
Group on Language catalyst in order to pool concrete results coming from current
Engineering Standards) major European projects. Relevant common practices or
http://www.ilc.cnr.it/EAGLES9 upcoming standards are being used where appropriate as input
6 From 1996 to EAGLES work, particularly in the areas of lexicons, text
encoding and speech.

ISLE (International Standards ISLE acts under the aegis of the EAGLES (Expert Advisory Group
for Language Engineering) for Language Engineering Standards) initiative, which has seen a
www.ilc.cnr.it/EAGLES96/ successful development and a broad deployment of a number
isle/ISLE_Home_Page.htm of recommendations and de facto standards.

Last Update: 02/03/2004 The aim of ISLE is to develop HLT (Human Language Technology
) standards within an international framework, in the context of
the EU-US International Research Cooperation initiative.
ELAN (European Language The primary aim of ELAN was to link existing language resources
Activity Network) with their potential users throughout Europe.
Last updated: 1998-10-26 The project aimed to achieve its objectives through:
http://cordis.europa.eu/ the design of a common query language (ELAN-CQL), which
projects/rcn/45377_en.html would reinforce or, where necessary, create international
standards;
the implementation of a user community network, which
would employ active awareness-raising measures, a clear
copyright policy, user support, e-mail user groups and more;
the provision of standardised resources for the following
languages: Albanian, Belgian French, Belorussian, Bulgarian,
Catalan, Croatian, Czech, Danish, Dutch, English, Estonian,
Finnish, French, German, Greek, Hungarian, Irish, Italian,
Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian,
Serbian, Slovakian, Slovene, Spanish, Swedish and Uzbek.
ELSE (Evaluation in Language The prospective action of ELSE aimed at preparing for Language
and Speech Engineering) Engineering Evaluation in the context of future R&D
From 1991 to 1999 programmes by developing and testing a general infrastructure
for a task-independent, semi-automatic protocol for a
quantitative black-box evaluation of Natural Language
Processing (NLP) systems in a multilingual environment.
EMILLE (Enabling Minority EMILLE was a 3 year EPSRC project at Lancaster University and
Language Engineering) Sheffield University. Its end product was a 97 million word
http://www.emille.lancs.ac.uk electronic corpus of South Asian languages (Bengali, Gujarati,
Hindi, Punjabi, Urdu, Singhalese, Tamil), especially those spoken
in the UK.
ENABLER (European National The ENABLER Network aims at improving cooperation among Search Engine in LRT
Activities for Basic Language national activities established by national authorities for Inventory
Resources) providing LRs for their languages. The action aims at:
From 2000 to 2003 establishing a regular exchange of information; identifying and
fostering possible synergies and cooperation; promoting the
http://www.ilc.cnr.it/enabler- compatibility and interoperability of their results, thus
network/index.htm facilitating the successful transfer of technologies and tools
among languages and the construction of multilingual LRs;
increasing the visibility and the strategic impact of those
national activities in the field of HLT; contributing to the
creation of an overall framework in which the public and private
sectors, national efforts and international coordination could
cooperate in order to answer the IST need for LRs.
FLaReNet (Fostering The European FLaReNet is intended to develop a common - FLaReNet Forum
Language Resources Network) vision of the area of Language Resources and Language - Workshops
http://www.flarenet.eu Technologies for the next years and foster a European strategy - Technical Reports
2008-2011 for consolidating the sector and enhancing competitiveness at - Recommendations
EU level and worldwide. - Deliverables
FLaReNet is organized into five Thematic Areas:
 the Chart for the area of LRs and LT in its different
dimensions;
 methods and models for LR building, reuse, interlinking,
maintenance, sharing, distribution, …;
 harmonisation of formats and standards;
 definition of evaluation and validation protocols and
procedures;
 methods for the automatic construction and processing
of LRs.
EuroWordNet EuroWordNet is a multilingual database with wordnets for
Start Date: March 1996 several European languages (Dutch, Italian, Spanish, German,
End Date: June 1999 French, Czech and Estonian). The wordnets are structured in
the same way as the American wordnet for English in terms of
synsets (sets of synonymous words) with basic semantic
relations between them.
Lirics (Linguistic Infrastructure LIRICS addresses the needs of today's information and Téléchargements\d1
for Interoperable Resources communication society where globalisation and localization
and Systems) necessitate multilingual communication creating an increasing
From 2005 to 2007 need for new standardization as well as urgent recognition of
www.lirics.loria.fr existing de facto standards and their transformation into 'de
jure' International Standards. LIRICS thus aims to:
 Provide ISO ratified standards for language technology
to enable the exchange and reuse of multilingual
language resources;
 Facilitate the implementation of these standards for
end-users by providing an open-source implementation
platform, related web services and test suites building
on legacy formats, tools and data;
 Gain full industry support and input to the standards
development via the Industry Advisory group and
demonstration workshops
 Provide a pay-per-use business model for use by
Industry and in particular SMEs validated during the
project for the benefit of all actors in the content and
language industries
MATE (Multilevel Annotation, aims at facilitating the re-use of language resources by A tool for annotating
Tools Engineering) addressing the problems of creating, acquiring, and maintaining XML corpora.
March 1998 until December language corpora.
1999 The problems are addressed along two lines:
1. through the development of a standard for annotating
resources;
2. through the provision of tools which will make the
processes of knowledge acquisition and extraction more
efficient.
Specifically, MATE will treat spoken dialogue corpora at
multiple levels, focusing on prosody, (morpho-) syntax, co-
reference, dialogue acts, and communicative difficulties, as well
as inter-level interaction.
The results of the project will be of particular benefit to
developers of spoken language dialogue systems but will also
be directly useful for other applications of language
engineering.
Multext (Multilingual Text Multext is developing a series of tools(Multilingual text editor, Sample Lexicons
Tools and Corpora) SGML manipulation tools, Text segmentation tools , Morpho- (German, Italian,
lexical tools , Multilingual text alignment , Speech Spanish, French)
http://aune.lpl-aix.fr/ Workbench ) for accessing and manipulating corpora, including Tools:
projects/multext/ corpora encoded in SGML, and for accomplishing a series of
corpus annotation tasks, including token and sentence
Last modified 22 April 1996. boundary recognition, morphosyntactic tagging, parallel text
alignment, and prosody markup. Annotation results may also be
generated in SGML format.
NERC (Network of European The goal of NERC was the preparation of a European
Reference Corpora) infrastructure on language resources.
Specifically, the project aimed at:
 setting up a European Corpora Network;
 producing a survey of the existing resources in the
various European countries;
 suggesting short, medium and long-term actions in the
field of corpora;
 investigating the legal aspects of the issue in the
European countries;
 proposing a common linguistic annotation schema for
written resources.
PAROLE (Preparatory Action The objective of MLAP-PAROLE (MultiLingual Action Plan -
for Linguistic Resources Preparatory Action for Linguistic Resources Organisation for
Organisation for Language Language Engineering) - commonly known as PAROLE - was to
Engineering) define, actively pursue and prepare the construction of a
From 1994-12 to 1995-11 network of organizations for the design and reusability of
PAROLE II From 1996-04 to language resources (texts and lexica) and the relevant tools in
1997-03 the European Union.
RELATOR (European Network The objectives of RELATOR were:
of Repositories for Linguistic  to create structured, publicly available catalogues of existing
Resources) Linguistic Resources (LRs), using and extending the
information already collected by various international and
national survey initiatives;
 to discuss with the relevant actors (e.g. owners of resources,
producers, private and public users, funding bodies,
international organisations, scientific and professional
associations) the various aspects of the problem, their needs
and requirements, the possible solutions, their willingness to
cooperate and the conditions for a joint European action;
 to identify, describe and evaluate at various levels (e.g.
organisational, technical, legal, financial) alternative methods
and structures which could ensure the creation, management
and maintenance of a European repository of reusable LRs,
and their dissemination to various types of users;
 to experiment with the collection and dissemination of
existing LRs, using (i) a distributed electronic network and (ii)
CD-ROM pressing facilities, with the aim of encouraging the
reuse of already available resources and also of acquiring
experience which would feed into the formulation of final
recommendations;
 to present final recommendations for establishing a
collaborative infrastructure that would act as a collection,
verification, management and dissemination centre, built on
the foundation provided by existing European networks and
organisations.
SIMPLE (Semantic The objective of SIMPLE was to provide the first core of a large
Information for scale, re-usable and customisable lexicon in all EU languages,
Multifunctional Plurilingual including morphological, syntactic and semantic layers.
Lexica) Reaching this objective has required the addition of a semantic
level to the resources developed in PAROLE (Preparatory Action
for Linguistic Resources Organisation for Language Engineering),
by encoding semantic information selected on the basis of
frequency of occurrence in the aforesaid project.
Lexicon development has proceeded according to the
requirements set out in PAROLE, with a specific focus on
encoding standards, extendibility and reusability.
The project has also made resources generally available and
accessible, by providing the necessary lexicon management
tools.
SPARKLE (Shallow PARsing The main objective of SPARKLE was to develop robust and
and Knowledge extraction for portable tools for lexical acquisition to aid commercial
Language Engineering) applications in the area of multilingual information
From 1996 to 1998 management.
There were three intermediate goals to achieve this objective:
 to develop generic shallow parsing tools to handle
unrestricted texts in English, French, German and
Italian;
 to use shallow parsing to develop systems for semi-
automatic acquisition of lexical information concerning
subcategorisation, semantic classes of predicates,
argument structure, preferential selectional restrictions
and diathesis alternations;
 to compare the relative success of different approaches
to shallow parsing and lexical acquisition both intra-
and inter-linguistically.
TELRI (Trans-European The TELRI concerted action was an initiative funded by the
Language Resources European Commission that has created a viable infrastructure
Infrastructure) between leading European language and language technology
1999 centres in order to provide a platform for industry, research
institutes and universities and to supply the NLP community
with precompetitive/public domain monolingual and
multilingual language resources, such as corpora, machine
readable dictionaries and lexica, lexical databases and software
tools for the creation, re-use, maintenance, valorisation and
exploitation of linguistic data.
International projects
NEDO (Developing The goal of NEDO is developing a standard description
International Standards of framework of language resources which are infrastructure of
Language Resources for language technology and contributing to making international
Semantic Web Applications) standards of language resources.
http://www.nedo.go.jp In order to achieve this goal, we will work on the following four
research items:
(1) developing a standard description framework of lexical
entries;
(2) constructing sample lexicons of several Asian languages;
(3) developing a standard upper ontology;
(4) evaluating the standards through building language
resources of a specific application.
OLAC (the Open Language OLAC is an international partnership of institutions and
Archives Community) individuals who are creating a worldwide virtual library of
Last modified: Feb 2011 language resources by: (i) developing consensus on best current
http://www.language- practice for the digital archiving of language resources, and (ii)
archives.org/ developing a network of interoperating repositories and
services for housing and accessing such resources.
WRITE (Written Resources The WRITE Committee is the natural evolution of the ICCWLRE
Infrastructure, Technology (International Coordination Committee for Written Language
and Evaluation) Resources and Evaluation), born from an idea of Antonio
http://www.ilc.cnr.it/write/ Zampolli and officially launched during the ENABLER/ELSNET
Workshop "International Roadmap for Language Resources"
(Paris, 28th-29th August 2003).
L'arabo per la 488 The objective of the project is twofold: on the one hand, the  Morphological
http://www.ilc.cnr.it/abaro/ creation and the elaboration of software procedures for Arabic Analyzers and
principale2_ara.htm language and, on the other hand, the creation of linguistic parsers
resources for the management of large Arabic corpora.
The linguistic resources are substantially the following:  automatic taggers
 Morphological engine for the Arabic language. The
engine is constituted by a number of modules: the
algorithms and modules for generation and analysis, an
appropriate encoding system for the representation of
lexical data and of morphological characteristics of
Arabic, the so-called “lemmario”, i.e. the archive of
lemmas.
 The automatic alignment of parallel texts in Italian and
Arabic language.
 Automatic tagging of Arabic texts, performed by using
the above morphological engine
 Systems for accessing and querying (raw and/or tagged)
Arabic texts and parallel Italian-Arabic corpora.
Lexus, ESFRI => Dariah, Clarin
Tamil
SyllableNet

http://www.ilc.cnr.it/EAGLES/intro.html

http://aune.lpl-aix.fr/projects/multext/

http://clarin.eu/

http://www.dariah.fr/

You might also like