Armstrong Stephen
Way Andy
Caffrey Colm
Flanagan Marian
Kenny Dorothy
O'Hagan Minako
School of Computing,
School of Applied Language and Intercultural Studies, Dublin City University, Ireland
Stephen Armstrong & Andy Way, School of Computing,
Colm Carey, Marian Flanagan, Dorothy Kenny & Minako O’Hagan,
School of Applied Language and Intercultural Studies,
Dublin City University, Ireland
This paper describes a project to investigate the scope of the application of Example-Based Ma-chine Translation (EBMT) to the translation of DVD subtitles and bonus material for English-German and English-Japanese. The project focused on the development of the EBMT system andits evaluation. This was undertaken as an interdisciplinary study, combining expertise in theareas of multimedia translation, corpus linguistics and natural language processing. The mainareas on which this paper focuses are subtitle corpus creation, development of the EBMT system for English-German, development of evaluation methods for MT output and an assessment of the
 productiveness of dierent data types to EBMT.
Key words:
Machine translation; EBMT systems; corpus creation; subtitling; English-German; English-Japanese.
1. Background
Demand for subtitle translation is on the increase due to the proliferation ofDVD releases of audiovisual content in particular for feature lms. Despite theupward demand, however, the working conditions for human subtitlers aredeclining with decreasing rates of pay and mounting time pressure to translatewithin shorter and shorter timeframes. Worse still, when producing DVD sub
titles in multilingual versions, translators are sometimes forced to work solelyon the basis of a master le containing the source subtitle text without accessto the audiovisual content. Furthermore DVD has opened the oodgate of pi
racy issues with lms distributed illegally, undercuing the ocial prices, withsometimes extremely poor quality subtitles invariably carried out by unquali
ed amateurs. Piracy is another reason why ocial versions need to be distri- buted without delay. These issues are repeatedly raised in recent audiovisualtranslation conferences
and yet the market reality suggests that the prices haveto be contained due to erce competition (Carroll 2004). The subtitling processis increasingly facilitated by computer-based subtitling systems, but they aremainly used for mechanical aspects such as time-coding and word-processingwhile the translation process itself remains unaided. The lack of aempts tointroduce computer-aided translation (CAT) in audiovisual translation particu
larly for ctional lms may stem from the notion that the source text mainlyrepresenting dialogues is unlikely to render well to machine translation (MT).The problem includes incomplete sentences with ellipsis, the need for conden
sation to t the translation into the allocated space as well as the requirementfor synchronizing the text with the images. All these elements may have beenconsidered insurmountable challenges to MT.However, our investigation of Example-based MT (EBMT) seeded with sub
title data has an immediate link to the research direction represented in Taylor
0907-676X/06/03/163-22 $20.00Perspectives: Studies in Translatology©2006Armstrong/Way/Carey/Flanagan/Kenny/O’HaganVol. 14, No. 3, 2006
2006. Perspectives: Studies in Translatology. Volume 14: 3
(2006a, 2006b) in detecting predictable paerns used in dialogues of ctionalaudiovisual content. Implicit in our interest in the EBMT paradigm (explainedin section 3.1 below) is therefore to seek to what extent repetition or similarityexists across lm dialogues, both at a sentential and especially sub-sententiallevel. This will benet audiovisual translation research and similarly EBMTwhere there is no prior research focusing on subtitles for ctional lms.There have been a number of early aempts to develop an MT system inthe area of news subtitles, with notable examples by public broadcasting bod
ies such as NHK (Japan Broadcasting Corporation), testing MT for displaying Japanese subtitles for English language satellite news in the 80s with the dis
claimer credit of “MT-produced translations” running at the boom of the TVscreen. Following the early aempts mainly using a transfer-based MT system,they have also tested the then developing EBMT paradigm (e.g. Nagao 1984)as reported in their 1996 annual report (NHK Annual Report 1996). Today themain foreign satellite news reports are translated live by human media transla
tors in Japan, suggesting that the research has not produced workable systems.Also in the US, commercial MT systems were built to automatically translateand produce Spanish captions from English news (Toole et al. 1998). Therehave also been recent high-prole projects undertaken in Europe to automatesubtitle translations. One is the MUSA (Multilingual Subtitling of MultimediaContent)
project funded by the European Union to produce a set of technolo
gies to automatically produce subtitles for English TV documentaries in English(intralingual subtitles), French and Greek. In addition to an MT component, theMUSA project included a development of a speech recognition engine to turnthe audio input into text and also a condensation technology to shrink the MToutput into a shorter sentence to be immediately usable as a subtitle. Another isthe eTITLE project
aimed at enabling faster multilingual cross-platform locali
sation for media content owners via linguistic technologies such as automatedspeech-to-text, MT, sentence compression, subtitling automation and metadataautomation. These projects dier from the present study in scope and coverageand most of all the fundamental interest of our project in investigating the suit
ability of the EBMT paradigm for the text type of subtitles for ctional lms.Our project is driven by the deteriorating working conditions developing forsubtitlers and the fact that they currently translate mostly without the benetof CAT tools. The ultimate goal of the current study is therefore to build a CATtool for human subtitles, integrating an MT unit to the existing computer-basedsubtitling system. Such tools will be designed to increase the throughput ofhuman subtitlers, enabling them to produce subtitles faster and even improvetheir quality. A preliminary study (O’Hagan 2003) had pointed to the scopefor applying a CAT paradigm to audiovisual translation on the basis of theshortness and the relative lack of complex sentence structures characteristic ofsubtitles. The present project set out to test the feasibility of seeding an EBMTsystem with human-produced subtitles and applying it to subtitle translation.We argue that our choice of using EBMT as opposed to more freely availablerule-based MT (RBMT) is motivated by the increasing technical feasibility toharvest human-produced subtitles from DVDs in signicant quantities, copy
right issues notwithstanding, and following the popular Translation Memory(TM) paradigm where translators are able to build up their own resources to

