You are on page 1of 8

LANGUAGE RESOURCES FOR COMPUTER ASSISTED TRANSLATION FROM ITALIAN TO ITALIAN SIGN LANGUAGE OF DEAF PEOPLE

Davide Barberis, Nicola Garazzino, Paolo Prinetto, Gabriele Tiotto, Alessandro Savino, Umar Shoaib, Nadeem Ahmad
Politecnico di Torino, Italy

ABSTRACT This paper discusses the use of language resources for Computer Assisted Translation (CAT) from Italian Language to Italian Sign Language (LIS) of Deaf People. It gives an overview of a CAT translation system. The pipeline process allows a user to obtain the translation of an Italian Language text as an animation of a virtual avatar signing in LIS. The paper describes the architecture of the translation system and how the language resources are integrated in the platform. It analyzes the features that characterize a dictionary, which links two languages among different features and different communication channels. KEYWORDS Deaf, Sign Language, Assisted Translation, Virtual Avatars, Sign Language Synthesis. INTRODUCTION Sign Languages (SLs) are visual languages used by deaf people to convey meaning. SLs rely on signs as lexical units instead of words used in common languages. Italian deaf people resort to Italian Sign Language (LIS) within their communities and it can be considered as the main way of communication of 60.000 Italian deaf individuals [1]. Inclusion of deaf people in the society aims at providing them access to services in their own language and spreading signed language knowledge among the communities. In this context, the Machine Translation (MT) provides a way to translate the written language into the visual language exactly as it happens from one written language to another. The research on Sign Language Machine translation deals with several issues, generally related to both the translation and the generation of signs. Most of the translation approaches (e.g., television closed captioning or teletype telephone services) assume that the viewer has strong literacy skills [2]. Then, reading the captioning results to be difficult even if they are fluent in sign language. In these cases, if a live interpreter is not feasible, the translation is avoided to deaf people. A new approach, which takes into account the building a system that is able to produce both the translation and the visualization of signs at the same time, is thus very important. This paper describes a system for the computer-assisted translation from Italian Language to LIS that provides the output of the translation resorting to a virtual avatar. It focuses on

sport. etc…) represents an important resource for testing language hypothesis purposes. work reasonably well but can be improved. An important work on Swiss-German Sing Language is the Multimedia Bilingual Databank that collects 60h of informal groups discussions on given arguments (medicine. The way the corpora are generated has to be considered because it influences both the quality of annotations and the related information. scaling factor optimization and selecting the best alignment. The available corpora and lexicons are not comparable with the resources used to perform written language translation. Section 2 provides the state of the art of the sign language translation. The general unavailability of language resources for sign language translation is the main issue reported. Today several examples of corpora that satisfy those requisites are listed in literature [19]. The Deutschschweizerische Gebardensprache (DSGS) corpus [16] collects 18h of group discussions data in Swiss-German Sign Language..the necessary linguistic knowledge and resources required for the development. Section 5 gives a set of conclusions and future perspectives. Other examples take effort of statistical or example based techniques to perform translation [9]. More sophisticated algorithms. BlueSign targets LIS translation without taking into account LIS grammar. The DGS corpus collects data (340-400h scheduled) from dialogues. most of them targeting translation of single words or fingerspelling.e.. Typical procedures. In [11] authors analyze different statistical translation techniques applied to sign language and conclude that some of them give reasonably good results. Dialogues and conversation in natural settings are the data. [10]. well documented and machinereadable [12]. morphology and phonology. i. syntactic enhancements. A MT to LIS system to perform the translation resorting to LIS grammar has never been mentioned in literature. Recent researches on corpora creation underlined that corpora needs to be representative. Some examples are reported of translation to other sign languages. The VISICAST translator [2] and the eSIGN [3] project defined the architecture for sign language processing and the gesture visualization with a virtual character by using motion capture and synthetic generation to create signs. i. Other examples on German Sign Language are provided by the Berlin corpus that collects formal structures of DGS focusing on the interdependency of spoken and signed language based on the collection of natural DGS data. lexicons. free conversation. elicited narratives and elicited lexical items. The creation and annotation of these corpora (parallel corpora. STATE OF THE ART The application of MT to SL is rather recent. BlueSign [4]. The DGS Corpus project [15] aimed at creating of a general-purpose corpus of German Sign Language to study lexical variation. fail to improve over the baseline. Zardoz [5] and SignSmith Studio [6] are additional examples of working systems. Section 4 goes deeply into the architecture of the ATLAS project [3] and deals with and the interaction between the tool and the language resources. Section 3 describes the approach to the design of a system for a visual language translation. Several projects that targeted SL capturing and synthesis were developed and important results have been achieved to improve accessibility of the deaf to the hearing world. Fluent DGS signers perform the data. semantics. .e.

The sentence defined through a formalism called ATLAS Written LIS (AWLIS). by virtual avatars. The domain is focused on technical terms for education purposes. resorting to two different translators: a statistical one and a rule based one. Since a significant intervention by the user is required. THE ATLAS PLATFORM The ATLAS system (Figure 1) is designed to get a written text as input and to perform the translation.. The semantic-syntactic interpreter resorts to an ontology modeling the weather forecasting domain. A rich Sign Language lexical corpus is the SignLex Corpus that includes 5616 videos of elicited lexical items signed by native signers. In these systems. statistical approaches help the user while performing manual translation.). Figure 1 – Architecture of the ATLAS Systems . The Rule Based translator is based on a traditional rule-based approach.g. To our knowledge they all miss the final step of translation: the visual adaptation. The input sentences are interpreted in terms of an ontology-based logical representation.schooling. The ECHO Project [18] constitutes an example of multi lingual sign language corpus: it collects data in Dutch. A LIS sentence is basically a sequence of annotated glosses carrying a set of additional syntactic information. an open source statistical translator that automatically trains the translation models for any language pair. which acts as input to a linguistic generator that produces the corresponding LIS sentence. Statistical approaches are used for Computer Assisted Translation (CAT) [19] as well. It resorts to 30 min of videos per sign language transcribed at gloss level. British and Swedish Sign Languages. The statistical translator is based on MOSES [8]. The NGT Corpus [17] for Dutch Sign Language collects 2375 videos (stated at September 2010) for a total of about 72h of data on dialogues and narratives signed by 92 native signers. these systems are generally more robust then automatic ones. The interpreter performs the analysis of the syntactic tree and builds a logical formula by looking at the semantic role of the verbal dependents [22]. The corpus is fully transcribed at Gloss level. e. etc.

By increasing the amount of language resources. This allows the AI directly to play the basic forms of the signs by taking as input the AEWLIS. This corpus has been used to train the ATLAS statistical translator that. Currently. in a video format. such as lexicon and training corpora. performed by the avatar. The ATLAS Editor for Assisted Translation (ALEAT) is the CAT translation tool developed within the ATLAS project. The AEWLIS is an improved version of AWLIS. The Signary. ALEAT provides a userfriendly interface to perform all operations of the CAT process. Translation Reliability Sign Language translation is error prone. The corpus has been annotated using the ATLAS Editor for Annotation (ALEA). we expect to achieve a good translation level. it applies a set of modifiers to the shape. Once the basic shape is done. The user selects the most feasible translation and modifies all mistakes by user interface.g. It translates an Italian text into ATLAS Extended Written LIS (AEWLIS). resorting to a set of manual translation steps. All the modifiers are specified by means of a tagging window. a web application developed and used within the ATLAS project for the annotation of video content [25]. while results on the extraction of sign relocation data from the ATLAS corpus are provided in [22]. 2.. THE ATLAS CORPUS A complete description of the ATLAS corpus is out of the scope of this paper. Actually the relocation process has been studied [22] and the results help designing the Planner module. henceforth referred to as lexicon. The collected corpus consists in the translation from Italian into LIS of 40 weather forecasting news recorded from the national broadcasting network. along with the .This information is collected within the AWLIS formalism. we’ve got a long way to go before we can provide a reliable translation because with can provide only 40 translated weather forecasts in our corpus. is the ATLAS lexical database that collects all the basic forms of the signs. An introduction has been provided in [20]. co-articulation and iconicity.[26]. ALEAT provides access to the parallel corpus in order to retrieve a set of feasible translations. A Planner: it gets the signs and set them into the signing space. An Animation Interpreter (AI): it takes the AWLIS and the sign planning to render the signs by smoothing them together. automatic relocation. which is sent to the animation module. The animation engine retrieves a basic form of the sign for each lexical entry specified in the AWLIS from a database. To improve translation quality the user has to be able to correct translation mistakes. facial expression. The research teams involved in the ATLAS project are performing studies on the ATLAS corpus in order to extract useful data related to LIS linguistic phenomena such as relocation. ALEAT gives also access to the lexicon in order to connect each lemma within the AEWLIS with its correspondent entry in the lexicon. body movements. according to relocations strategies. This module is composed by: 1. e.

The former includes general-purpose domain signs. Since each natural language word can be related to a set of synonyms. The search result includes 3 different scenarios: 1. It contains two sets of signs: general-purpose and weather forecast related. Each video is linked to his translation in Italian language. The input word is not found. Wordnet [24] is a dictionary that handles the existence of relation among words defining the synset: it models the sense of a single headword. The second set of signs includes: • 113 New Signs mostly related to the weather forecasts domain • 73 Variations of signs from the LIS Dictionary • 214 New Standard LIS signs not present in the LIS Dictionary WordNet Integration The dictionary implementation is the main issue of a translation system. Thus. our informants suggested that this set of sign is currently used as a standard lexicon in Italy. Figure 2 – The Wordnet Integration Flow . is involved in the core translation of the ATLAS pipeline. The ATLAS corpus includes a lexicon of 3063 signs as high quality videos. words sharing the same synset can be defined as synonyms. 3. Including the Wordnet platform as part of the translation process require to store the sysnet data to the lexicon too. the relation among words of different languages is not unique. Synonyms can be very useful when the corpus is not fully annotated. Although it contains signs within the Rome area along with some variations used in other regions of Italy. The architecture performs a set of operations to translate an Italian word into a LIS sign (Figure 2). Meaningful !elds include de!nitions and the semantic domains. The input word has multiple correspondences.Rule Based translator. The input word has only one correspondence. 2.

.uk/. [6] Blue sign partners: Blue sign translator.g.html. [5] e-Sign project website: http://www. The third case happens when some disambiguation issue is related to the original word.it/.eu/italy-i-187. [3] ATLAS project WebSite: http://www. the user is able to manually select the preferred LIS sing.CIPE 2007” framework (Research Sector : Cognitive Science and ICT). The tool is currently being integrated within the ATLAS platform.uni-hamburg. In this case. M Huenerfauth .2003 . REFERENCES [1] Eud homepage: http://www. using the synset data associated to the retrieved results.co.sign-lang. the synset information associated to the word became the entry for a Wordnet search. ALEAT. co-funded by Regione Piemonte within the ”Converging Technologies . [4] Visicast project website: http://www.. If some of the word synonyms is already in the lexicon cases 2 and 3 can rise. the user is able to select the right LIS. Implementing the WordNet infrastructure to support both the limitation of annotated words (using the synonymous) and the disambiguation of word meaning is the main improvement of introduced by the paper.unisi.de/esign/.it/. Conway. We described the workflow of our CAT tool.polito.eud. Future work aims at testing the intelligibility of translation produced with ALEAT and at comparing this with the pure automatic translation. since some words of the Italian language have multiple meanings (e.Technical Report MS-CIS-0332. Veale and A. [2] A survey and critique of American Sign Language natural language generation and machine translation systems. CONCLUSION AND FUTURE WORK In this paper we proposed the use of lexical resources and corpora in the CAT of Sign Languages.atlas.dii.visicast. The second case gives an immediate translation but can be also supervised to check if more than one LIS sign is linked to the original word. the Italian word pesca may refer to both the peach and the to fish verb). Available at http://bluesign. [7] T. ACKNOWLEDGEMENT(S) The work presented in the present paper has been developed within the ATLAS (Automatic Translation into sign LAnguageS) Project. In this case.In the first scenario. Cross Modal Comprehension in ZARDOZ.

A. Ney. On the creation and the annotation of a large-scale Italian-LIS parallel corpus. Norway. Bungeroth. Cristina Battaglino. Leonardo Lesmo. Garazzino N. H. J. F.de/home. “Towards a hybrid data-driven mt system for sign languages. Valletta. 154–157.. Evan Herbst (2007). Sept. Avatar Based Computer Assisted Translation from Italian to Italian Sign Language of Deaf People. Gabriele Tiotto. Marcello Federico. Wade Shen. Barrachina. Statistical approaches to computerassisted translation. 52 [14] [15] Ngt corpus webpage: http://www. 2010. Politecnico di Torino. [10] Stein. 35(1): 3–28. Malta. Oslo.” in Machine Translation Summit.[8] S. 169–177 [11] G.53-59) . (Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies (CLSTLREC 2010). pp. Lagarda. 52 [16] S. Copenhagen. Ney. 52 Echo pro ject webpage: http://echo. Way. PhD thesis. and H. Badia. May 2010.. Vidal (2009).de/dgs-korpus/ index. Tomas.7 (pp. ATLAS: An Italian to Italian Sign Language Translation System Through Virtual Characters..ch/projekte_ detail-n70-r76-i574-sE.. 2010 pp. Leonardo Lesmo (2011). J. Ondrej Bojar. Tiotto G. Morrissey. A. Proceedings of 3rd International Conference on Software Development for Enhancing Accessibility and Fighting Info-exclusion (Oxford (UK)) November 2526. 2007. June 2007. 2011 (to appear) [19] Nicola Bertoldi. Fabrizio Nunnari. S. USA.23/05/2010 [20] Barberis D. Rossana Damiano. Boston. Elio Piccolo. June 2006. 4. Richard Zens. (2006): Morpho-Syntax Based Statistical Methods for Sign Language Translation.uni-hamburg. 329–335. pp. Gabriele Tiotto.html. Where should I put my hands? Planning hand location in sign languages. Civera. “Dealing with sign language morphemes for statistical machine translation” (2010). in 4th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies. Nicola Bertoldi.sign-lang. Malta) 22/05/2010 . D. Cubel. demonstration session. [9] G. H. et al. E. O. Alexandra Birch. Alexandra Constantin. Masso and T. Chris Callison-Burch. Hieu Hoang. Brooke Cowan.. 2009 [17] Philipp Koehn. Paolo Prinetto. n.nl/corpusngtuk/. Andrea Del Principe (2010). Chris Dyer. Alessandro Mazzei. Vincenzo Lombardo. Stein. Prinetto P.ru. (2010). D. In: 11th Annual conference of the European Association for Machine Translation. Carlo Geraci. Daniele Radicioni. Bungeroth. Prague. April 2011 [12] Dsg corpus webpage: http://www.mpiwg-berlin.php/welcome.mpg. Bender. Alessandro Mazzei.html. J. Moses: Open Source Toolkit for Statistical Machine Translation. Tiotto (2011). pp. Khadivi. Annual Meeting of the Association for Computational Linguistics (ACL). La Valletta. Computational Linguistics. 52 [13] Dsgs corpus webpage: http://www. Denmark. [18] Alice Ruggeri.hfh. Casacuberta. Ney (2007). Christine Moran. in Workshop on Computational Models for Spatial Language Interpretation and Generation. Czech Republic. Rossana Damiano. E. J.

Tiotto G. Lecture Notes In Computer Science Vol. Prinetto P..php . (2010).eu/english/home.7 (pp. A Web Based Platform for Sign Language Corpus Creation.2 pp. Garazzino N..fbk. Piccolo E..193-199) [22] Wordnet web site: http://multiwordnet..[21] Barberis D.