This article was downloaded by: On: 18 February 2009 Access details: Access Details: Free Access Publisher Routledge Informa

Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Publication details, including instructions for authors and subscription information:

Armstrong Stephen a; Way Andy a; Caffrey Colm b; Flanagan Marian b; Kenny Dorothy b; O'Hagan Minako b a School of Computing, b School of Applied Language and Intercultural Studies, Dublin City University, Ireland Online Publication Date: 31 January 2007

To cite this Article Stephen, Armstrong, Andy, Way, Colm, Caffrey, Marian, Flanagan, Dorothy, Kenny and Minako,

To link to this Article: DOI: 10.1080/09076760708669036 URL:

Full terms and conditions of use: This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.


LEADING BY EXAMPLE: AUTOMATIC TRANSLATION OF SUBTITLES VIA EBMT1 Stephen Armstrong & Andy Way, School of Computing, Colm Caffrey, Marian Flanagan, Dorothy Kenny & Minako O’Hagan, School of Applied Language and Intercultural Studies, Dublin City University, Ireland Abstract
This paper describes a project to investigate the scope of the application of Example-Based Machine Translation (EBMT) to the translation of DVD subtitles and bonus material for EnglishGerman and English-Japanese. The project focused on the development of the EBMT system and its evaluation. This was undertaken as an interdisciplinary study, combining expertise in the areas of multimedia translation, corpus linguistics and natural language processing. The main areas on which this paper focuses are subtitle corpus creation, development of the EBMT system for English-German, development of evaluation methods for MT output and an assessment of the productiveness of different data types to EBMT. Key words: Machine translation; EBMT systems; corpus creation; subtitling; EnglishGerman; English-Japanese.

1. Background Demand for subtitle translation is on the increase due to the proliferation of DVD releases of audiovisual content in particular for feature films. Despite the upward demand, however, the working conditions for human subtitlers are declining with decreasing rates of pay and mounting time pressure to translate within shorter and shorter timeframes. Worse still, when producing DVD subtitles in multilingual versions, translators are sometimes forced to work solely on the basis of a master file containing the source subtitle text without access to the audiovisual content. Furthermore DVD has opened the floodgate of piracy issues with films distributed illegally, undercutting the official prices, with sometimes extremely poor quality subtitles invariably carried out by unqualified amateurs. Piracy is another reason why official versions need to be distributed without delay. These issues are repeatedly raised in recent audiovisual translation conferences2 and yet the market reality suggests that the prices have to be contained due to fierce competition (Carroll 2004). The subtitling process is increasingly facilitated by computer-based subtitling systems, but they are mainly used for mechanical aspects such as time-coding and word-processing while the translation process itself remains unaided. The lack of attempts to introduce computer-aided translation (CAT) in audiovisual translation particularly for fictional films may stem from the notion that the source text mainly representing dialogues is unlikely to render well to machine translation (MT). The problem includes incomplete sentences with ellipsis, the need for condensation to fit the translation into the allocated space as well as the requirement for synchronizing the text with the images. All these elements may have been considered insurmountable challenges to MT. However, our investigation of Example-based MT (EBMT) seeded with subtitle data has an immediate link to the research direction represented in Taylor
0907-676X/06/03/163-22 $20.00©2006Armstrong/Way/Caffrey/Flanagan/Kenny/O’Hagan Perspectives: Studies in Translatology Vol. 14, No. 3, 2006


2006. Perspectives: Studies in Translatology. Volume 14: 3

(2006a, 2006b) in detecting predictable patterns used in dialogues of fictional audiovisual content. Implicit in our interest in the EBMT paradigm (explained in section 3.1 below) is therefore to seek to what extent repetition or similarity exists across film dialogues, both at a sentential and especially sub-sentential level. This will benefit audiovisual translation research and similarly EBMT where there is no prior research focusing on subtitles for fictional films. There have been a number of early attempts to develop an MT system in the area of news subtitles, with notable examples by public broadcasting bodies such as NHK (Japan Broadcasting Corporation), testing MT for displaying Japanese subtitles for English language satellite news in the 80s with the disclaimer credit of “MT-produced translations” running at the bottom of the TV screen. Following the early attempts mainly using a transfer-based MT system, they have also tested the then developing EBMT paradigm (e.g. Nagao 1984) as reported in their 1996 annual report (NHK Annual Report 1996). Today the main foreign satellite news reports are translated live by human media translators in Japan, suggesting that the research has not produced workable systems. Also in the US, commercial MT systems were built to automatically translate and produce Spanish captions from English news (Toole et al. 1998). There have also been recent high-profile projects undertaken in Europe to automate subtitle translations. One is the MUSA (Multilingual Subtitling of Multimedia Content)3 project funded by the European Union to produce a set of technologies to automatically produce subtitles for English TV documentaries in English (intralingual subtitles), French and Greek. In addition to an MT component, the MUSA project included a development of a speech recognition engine to turn the audio input into text and also a condensation technology to shrink the MT output into a shorter sentence to be immediately usable as a subtitle. Another is the eTITLE project4 aimed at enabling faster multilingual cross-platform localisation for media content owners via linguistic technologies such as automated speech-to-text, MT, sentence compression, subtitling automation and metadata automation. These projects differ from the present study in scope and coverage and most of all the fundamental interest of our project in investigating the suitability of the EBMT paradigm for the text type of subtitles for fictional films. Our project is driven by the deteriorating working conditions developing for subtitlers and the fact that they currently translate mostly without the benefit of CAT tools. The ultimate goal of the current study is therefore to build a CAT tool for human subtitles, integrating an MT unit to the existing computer-based subtitling system. Such tools will be designed to increase the throughput of human subtitlers, enabling them to produce subtitles faster and even improve their quality. A preliminary study (O’Hagan 2003) had pointed to the scope for applying a CAT paradigm to audiovisual translation on the basis of the shortness and the relative lack of complex sentence structures characteristic of subtitles. The present project set out to test the feasibility of seeding an EBMT system with human-produced subtitles and applying it to subtitle translation. We argue that our choice of using EBMT as opposed to more freely available rule-based MT (RBMT) is motivated by the increasing technical feasibility to harvest human-produced subtitles from DVDs in significant quantities, copyright issues notwithstanding, and following the popular Translation Memory (TM) paradigm where translators are able to build up their own resources to

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


increase productivity. Also, given the relatively short timeframe for the project, we have set a realistic goal to test the feasibility of building and testing EBMT designed to produce German and Japanese subtitles from already available human-produced English intralingual subtitles. Our system is based on a number of assumptions: (i) we will specifically aim at producing translations for subtitles for DVD productions where the intralingual subtitles in the source language are already available; (ii) we will aim at developing the system mainly for English and German, followed by English and Japanese and (iii) we will not deal with the copyright issue for this particular feasibility study assuming that it will fall on the party who wishes to ultimately commercialise our concept. In this paper, we focus on our main EBMT system developed for English and German; the English-to-Japanese system is not discussed, for reasons of space constraints. Section 2 provides a brief description of our research design to explain our methodological approach. We give an overview of EBMT in section 3, together with a description of the marker-based approach that we use in our system. In section 4 we describe the corpora we created, as well as the other corpora used to test the effect of training the EBMT system on heterogeneous or homogeneous data. Section 5 contains thorough evaluations of our results using automatic metrics commonplace in MT today, as well as two manual evaluations carried out – one using the standard human scales of accuracy and intelligibility, and the other a summative, holistic evaluation to test the suitability of the automatically produced subtitles for viewing incorporated in the film clips. Finally, in section 6 we conclude, and avenues for further work are provided in section 7. 2. Methodology Our objective was to build an EBMT system for the purpose of the feasibility study to test if this data-driven MT paradigm works for translating subtitles for fictional films and, if so, to determine which data type is more productive to seed the system in order to produce high quality translation. Given the short timeframe, we had to make the system integration component as a post-project activity to incorporate the MT unit into the existing subtitling system. The project was able to take advantage of the prototype EBMT system being developed by the MT group at the National Centre for Language Technology (NCLT) at Dublin City University (Armstrong et al. 2006b; Stroppa et al. 2006).5 The first task was to design and build parallel corpora made up of humanproduced subtitles (in German and Japanese) for English-language material (cf. section 4 below). For the purpose of data type comparisons, we also created heterogeneous data from the publicly available Europarl corpus (Koehn 2005) in addition to the homogeneous data which consisted solely of subtitles. We first built an English-German EBMT system and then repeated the steps for English-Japanese. This was followed by a series of evaluation sessions. One of the objectives of this study was to explore a holistic evaluation methodology and therefore we combined the BLEU (Papineni et al. 2002) automatic metric and human-based methods. The BLEU scores provided a quantitative analysis to assess our EBMT performance according to different data sizes and also data types. The human evaluations provided a qualitative evaluation to point to shortcomings of the system and also some positive reinforcement to sup-


2006. Perspectives: Studies in Translatology. Volume 14: 3

port our approach. The research design was motivated by a number of factors, including the interdisciplinary character of the project drawing on talents from humanities and science. We will reflect on advantages and disadvantages of our approaches in our conclusions. 3. Example-Based Machine Translation 3.1. Introduction While various types of MT systems exist, almost all MT research being carried out today is corpus-based, with the two main data-driven approaches to MT being Statistical Machine Translation (SMT) and EBMT. Despite this, the main commercial MT systems available on the market today are primarily rule-based MT (RBMT). The idea behind EBMT is translation by analogy (Nagao 1984), meaning that human translations are recycled to automatically generate new output on the basis of similarities between the source text elements stored in the system’s databases and those of the input. Data-driven approaches such as EBMT rely on the availability of a sententially-aligned bilingual corpus, with which the system must firstly be trained in order to extract and store source-target subsentential alignments at a later stage. During the translation process, the input sentence is segmented into chunks. These source language chunks are then matched against the example database locating corresponding target language examples, so they can be recombined to produce the final output. The first stage of this process may sound familiar to those who have used Translation Memory (TM) tools. However, the essential difference between the two approaches lies in the fact that other than in 100% matches, a TM does not translate; rather, a human is required during the translation process to manipulate the target language sentences corresponding to close-matching source strings in the TM into the appropriate final translation. By contrast, EBMT systems translate completely automatically, and require no human intervention in the translation process. Somers (1999: 137) raises the question whether or not certain language pairs are more suited than others to use with EBMT. Our research design was to concentrate on the English-to-German system first and then to follow it by the English-to-Japanese system. The reasons for choosing these languages had to do with commercial considerations, as German and Japanese both represent significant markets for DVD sales. In addition, given the fact that these language pairs exhibit quite different translational phenomena, they were ideal for testing the coverage and robustness of the system. 3.2. Marker-Based EBMT Many different types of EBMT systems abound, including those using sourcetarget tree pairs (Hearne & Way 2006), dependency structures (Watanabe et al. 2003), strings (Somers et al. 1994), or those which generalise examples on the basis of content words (Brown 1999). Another approach clusters instead on closed-class or ‘marker’ words, and has its roots in the ‘Marker Hypothesis’ (Green 1979). This is a psycholinguistic constraint, stating that languages are ‘marked’ for syntactic structure at surface level by a closed set of specific lexemes and morphemes. As an example, consid-

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


er the string in (1) from the Wall Street Journal section of the Penn-II Treebank:
(1) The Dearborn, Mich., energy company stopped paying a dividend in the third quarter of 1984 because of troubles at its Midland nuclear plant.

Here we see that three noun phrases start with determiners and one with a possessive pronoun. The sets of determiners and possessive pronouns are both very small. Furthermore, there are four prepositional phrases, and the set of prepositions is similarly small. The Marker Hypothesis is arguably universal in presuming that concepts and structures like these have similar morphological or structural marking in all languages. When the EBMT system (cf. Armstrong et al. 2006b and Stroppa et al. 2006) was developed, eight marker sets were defined, namely determiners <DET>, prepositions <PREP>, quantifiers <QUANT>, conjunctions <CONJ>, wh-adverbs <WH>, possessive pronouns <POSS_PRON>, personal pronouns <PERS_ PRON> and punctuation <PUNC> (as an end of chunk marker). These marker categories are used to segment aligned source and target sentences during a pre-processing stage, indicating where one chunk ends and the next one begins. The following steps of the marker-based chunking can be explained with the English-German example in (2) (from ‘As good as it gets’, 1997):
(2) Do you like being interrupted when you’re playing in your garden? ↔ Werden Sie gern gestört, wenn Sie in Ihrem Garten herumhüpfen?

In (3) we see how the source/target aligned sentences are traversed word by word and automatically tagged with their marker categories:
(3) Do <PERS_PRON> you like being interrupted <CONJ> when <PERS_PRON> you ’re playing <PREP> in <POSS_PRON> your garden <PUNC> ? ↔ Werden <PERS_PRON> Sie gern gestört, <CONJ> wenn <PERS_PRON> Sie <PREP> in <POSS_PRON> Ihrem Garten herumhüpfen <PUNC> ?

The marking of syntactic structures is necessary for the extraction of translation resources. Once the marking stage is over, aligned source-target chunks are created by segmenting the sentences based on these tags, as well as by the use of word translation probabilities and cognate information. A further constraint exists when creating chunks in that each chunk must contain at least one non-marker word. This constraint ensures that each chunk contains useful contextual information. If multiple marker-words appear alongside each other, we keep the first and discard the rest.


2006. Perspectives: Studies in Translatology. Volume 14: 3 Figure 1: The System Architecture - MaTrEx

Where chunks contain just one non-marker word in both source and target, we assume they are translations. From this assumption it is possible to extract word-level translations, as in (4):
(4) <CONJ> when ↔ <CONJ> wenn <PREP> in ↔ <PREP> in <PERS_PRON> you ↔ <PERS_PRON> Sie <POSS_PRON> your ↔ <POSS_PRON> Ihrem

3.3. The MaTrEx EBMT system Now that we have described the marker-based chunking method, we will explain how our EBMT system uses this methodology and give examples to illustrate the various stages. The EBMT system used in this research is the MaTrEx (Machine Translation using Examples) system (Armstrong et al. 2006b; Stroppa et al. 2006).6 This is a corpus-based MT engine, and is designed in a modular fashion. Figure 1 illustrates the system architecture and the interaction of each module. There are four main modules in the system: Word Alignment Module, Chunking Module, Chunk Alignment Module and Decoding Module. Each of these modules works together to produce the most likely translation of the input sentence. In brief, the word alignment module takes an aligned corpus as input and produces a set of word alignments (Och & Ney 2003); the chunking module also takes an aligned corpus as input, and produces a corpus of source and target chunks; the chunk alignment module takes in source and target chunks, aligning them sentence by sentence; and finally the decoder (Koehn 2004) searches for a translation using the original aligned corpus, together with the derived word and chunk alignments. A more detailed description of each module can be found in (Armstrong et al. 2006b; Stroppa et al. 2006).

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


3.4. An EBMT example The following is an example task for the EBMT system: to translate the English input sentence in (5) into German, given the aligned data in (6) as the system’s training corpus. The English-German examples are taken from a mix of films including Breakfast at Tiffany’s, Casablanca, Being John Malkovich and Dr Strangelove.
(5) Darling, we just met two weeks ago at the bar (6) Darling <PUNC>, <PERS_PRON> I am sorry <CONJ> but <PERS_PRON> I lost <POSS_PRON> my key ↔ <POSS_PRON> Mein Guter <PUNC>, <PERS_PRON> es tut <PERS_PRON> mir Leid <PERS_PRON> Ich habe <POSS_PRON> meinen Schlüssel verloren <PERS_PRON> I’m <DET> an artist ↔ <PERS_PRON> Ich bin <DET> ein Künstler <DET> That was <QUANT> two weeks ago ↔ <DET> Das war <PREP> vor zwei Wochen <PERS_PRON> We just met <QUANT> one day ↔ <PERS_PRON> Wir trafen uns einfach <DET> eines Tages <PERS_PRON> I’ ll call <DET> the police ↔ <PERS_PRON> Ich rufe <DET> die Polizei <PERS_PRON> I’ll be <PREP> at <DET> the bar ↔ <PERS_PRON> Ich gehe <PREP> an <DET> die Bar

The data in the aligned corpus (6) is chunked (as described in section 3.2) extracting and storing useful chunks and their target-language counterparts for later use, including those in (7):
(7) Darling ↔ Mein Guter That was ↔ Das war Two weeks ago ↔ vor zwei Wochen The police ↔ die Polizei I’m ↔ Ich bin An artist ↔ ein Künstler We just met ↔ wir traffen uns einfach One day ↔ eines Tages At the bar ↔ an die Bar I lost my key ↔ Ich habe meinen Schlüssel verloren

In order to identify how useful a chunk will be in the translation process, a range of similarity metrics are used, including word alignment probabilities, cognates and marker chunk labels (cf. Stroppa et al. 2006; Armstrong 2007). These metrics are implemented in the chunk alignment module as previously mentioned. The first step in the translation process is to search the German side of the original corpus in (6) to check if it contains the whole input sentence in (5). It does not, so the system chunks the input sentence into smaller constituents (8):
(8) Darling <PERS_PRON> we just met <QUANT> two weeks ago <PREP> at the bar


2006. Perspectives: Studies in Translatology. Volume 14: 3

These new input sentence chunks are then searched for in the corpus of aligned chunks (7). Once suitable chunks in the database are found, they are recombined by the decoder to produce the final translation in (9):
(9) Mein Guter, wir trafen uns einfach vor zwei Wochen an die Bar

4. Corpus Description 4.1. Introduction A corpus is a large collection of authentic texts, gathered according to specific criteria and most commonly stored in electronic format. These texts can then be used to study authentic examples of language use (Bowker and Pearson 2002:9). In the field of computational linguistics, natural language processing tools, such as EBMT also use corpus-based resources (ibid.). When creating the corpora for our research purposes we wanted to have a selection of subtitle (homogeneous) and non-subtitle (heterogeneous) aligned data for each language pair. One of the main advantages of our EBMT system in comparison with a RBMT system is the former’s ability to feed a custommade selection of aligned sentences into it. Given the fact that no prior study existed in testing EBMT seeded with a subtitle-specific corpus, at the beginning of our research, we did not know whether a subtitle-specific homogeneous corpus would give us better results than a general language corpus made up of non-subtitle sentences. To this end it was necessary to build our own homogeneous corpora for both language pairs (English-German and EnglishJapanese). For our heterogeneous data, we were able to avail of an English-German corpus containing the European parliament proceedings, Europarl (Koehn 2005), which is freely available for research purposes, together with a publicly accessible English-Japanese heterogeneous corpus made up of various books and articles created by Utiyama & Isahara (2003). 4.2. Creating the Corpora Japan is traditionally a subtitling country, with foreign films for theatrical releases typically screened with subtitles (apart from Disney films intended for children which may be both subtitled and dubbed), and most DVDs contain subtitles. Even though Germany favours dubbing for theatrical releases, DVD films sold in Germany contain German subtitles. All the corpora created for our research are bilingual sententially-aligned parallel corpora, a prerequisite for EBMT systems. Firstly we decided to create a corpus containing subtitles from DVD films. These subtitles are from the main feature film. In addition, we created an English-German bonus material subtitle corpus, as well as one for English-Japanese. The majority of DVDs now contain extra bonus material such as a ‘behind-the-scenes’ documentary on how the film was made, an interview with the director and actors from the film, extra scenes which may be deleted from the final version, etc. Our second corpus consisted solely of bonus material. This corpus was substantially smaller in size than the main subtitle corpus, due to the fact that very often subtitles are not provided for all bonus materials, one of the factors influencing our experimental design. We concluded that the best way to create a homogeneous corpus was to build up a collection of DVDs of English language films, which contained German or Japanese subtitles alongside English intralingual subtitles. In an attempt to as-

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


sure the quality of the subtitles we trained the system on, we only took subtitles from major motion pictures which tend to have high-quality subtitles produced by humans. The corpus compilation work was also undertaken by a team of researchers competent in the given language combination so that any errors would be spotted. We extracted both the interlingual and intralingual subtitles and saved them in .srt format text files using the freely available software SubRip, providing us with the subtitle text in English, German and/or Japanese, along with their respective TC-in/TC-out (the time code at which the subtitle begins and ends). The software uses optical character recognition (OCR) to convert the subtitles into text format, as they are stored as an image. During the corpus creation stage, we noted that it would be extremely helpful if the subtitles were stored in a text format on DVD. The current standard of storing subtitles as graphic files necessitates the use of OCR, or in some cases basic transcription, which is even more time-consuming, inevitably leading to a loss of time in preparing the source data, particularly in the case of Japanese. As the OCR component of SubRip is not optimised to recognise Japanese characters, processing of the Japanese subtitles took on average at least five times as long as for German or English subtitles, which had a negative impact on the overall English-Japanese corpus size. This problem was compounded by the limited availability of DVDs with Japanese subtitles in Ireland, and the difficulties of sourcing them outside Japan due to the region code regulations7, thus further contributing to the difficulty in bulking up the subtitle data. After the first step of extracting the subtitles (one file for each language) from the DVDs, we needed to clean up the files by removing the time codes. This was done by running a Perl script on the files, leaving just the subtitles in text format. When training the EBMT system with the corpus, the system works more efficiently when the text is all lowercased for English and German (Japanese characters are treated slightly differently as there are no distinctions between lower and upper cases), meaning it will, for example, recognise that the token ‘The’ is the same word as the token ‘the’. This was also done by running a separate Perl script converting all the text to lower case. The two files were then sententially aligned. This is quite a time-consuming stage, but by automatically numbering the lines (which will work for English, German and Japanese), or by using an alignment tool such as Trados WinAlign,8 the time spent on this process can be reduced. The corpora were then ready to train and test with the EBMT system. 4.3. Corpus Statistics The German-English DVD subtitle corpus contains 40K sentence pairs and 187,337 words, the DVD bonus material corpus contains 10K sentence pairs and 40,443 words, while the heterogeneous corpus (Europarl) contains in excess of 1 million sentence pairs. We currently have 36 film titles aligned for EnglishGerman and 12 titles aligned for English–Japanese. The English-Japanese (homogeneous) DVD subtitle corpus consists of 12,700 sentences, roughly 124,012 Japanese characters and the heterogeneous corpus contains 82,805 sentences, equalling roughly 2,624,850 Japanese characters. When we compare the average Japanese sentence length of both corpora (9.76 characters per sentence for the homogeneous and 31.7 characters per sentence for the heterogeneous corpus)


2006. Perspectives: Studies in Translatology. Volume 14: 3

there is a noticeable difference, with the subtitles being on average one third the length of a sentence from an article or book in line with the earlier findings9 (O’Hagan 2003). Using the corpus analysis tool WordSmith,10 we were able to extract some interesting statistics from our own corpus. We calculated the average sentence length for both English and German to be a little less than 9 words. Contrast this with the average length of sentences in, for example, the Europarl corpus, which we calculated to be 24 words per sentence, clearly proving the presuppositions about the space-constraints imposed on subtitlers. The corpora are still growing, with at least 12 more film titles ready to be added to the English-German corpus and a lesser number to be added to the English-Japanese corpus. These corpora will be used for further research as discussed in the final section of this paper. It is imperative in our research to avoid creating corpora which are contaminated with erroneous translations, thus are unlikely to yield acceptable subtitles (as noted by human users). We considered that given the end-use of the translation as subtitles, human verification was deemed essential. Furthermore, creating and testing different types of corpora is an integral part of the evaluation process of the system. Once the most productive corpus type is established, this can then be built on over the course of the research. 5. Evaluation Approaches and Results 5.1 Introduction There are primarily two types of evaluation techniques: automatic and real user. Automatic evaluation has been a popular choice in the past for many natural language generation technologies due to the speed with which large amounts of text can be checked in a relatively small amount of time and it is also a very economical choice. Within the MT community, therefore, automatic evaluation is the norm (cf. NIST: Doddington 2002; BLEU: Papineni et al. 2002; GTM: Turian et al. 2003; METEOR: Banerjee & Lavie 2005 being the most often-used metrics). There is essentially no human intervention in this evaluation process (the recently introduced HTER metric (Snover et al. 2006) being one obvious exception). The added feature of using the automatic metric is quantitative feedback on the system’s performance: in our case, to quantify the results of training the system either on a domain-specific homogeneous corpus (DVD subtitles) or on a heterogeneous corpus (Europarl proceedings) which is non domain-specific. Given the nature of the text type and its use, we intended to incorporate some form of human evaluation in addition to the automatic metrics prevalent in MT community. Within translation studies, translation evaluation techniques focus on human input, and little credit is given to the automatic methodology where a text is scored automatically by a computer program. Human evaluation has various drawbacks, including being prone to subjective opinion, expensive and time-consuming. It does, however, play an important part in any kind of natural language generation system, given the fact that humans will ultimately be the end users of automatically generated text. The main aim of our evaluation process was, therefore, to move towards a balanced holistic evaluation of machine translation output. By incorporating

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


real user evaluation studies with automatic metrics, we hoped to gain a better understanding of the quality of our automatically generated DVD subtitles. 5.2. Evaluation using an Automatic Metric The automatic evaluation metric we used in our evaluation was BLEU (Bilingual Evaluation Understudy), which is based on the idea of measuring the translation closeness between a candidate translation and a set of reference translations with a numerical metric (Papineni et. al. 2002). BLEU scores are given between 0 and 1, where 1 indicates a perfect match between the output translation and the reference translations. These reference translations are treated as a “gold standard”, with which the EBMT system output is compared. The nearer the BLEU score is to 1, the better the quality of the output translation is deemed to be. In addition to generating a BLEU score for the target translations, we wanted to train the system on increasing amounts of homogeneous data and heterogeneous data, and to record the resulting BLEU scores for each. Our goal was to see which corpus type would produce the better scores and thus improve the quality of the automated subtitle translations. Table 1 illustrates the BLEU scores for both homogeneous and heterogeneous corpora, when the system is trained on the varying quantities of sentence pairs ranging between 10K and 40K from the English-German corpus respectively (cf. Armstrong 2007 for scores using other automatic evaluation metrics, and for the other language direction and results for the bonus material). The results show that by training the system on 10K sentence pairs from the homogeneous corpus (0.1082), we achieved almost 50% better results than training the system on 40K sentence pairs from the heterogeneous corpus (0.0737). While the improved score may appear far removed from the human-produced reference translation of 1, this is still considered to be a sign of significant progress from the point of view of system development. Note also that there is a consistent increase in BLEU scores when incremental amounts of homogeneous training data are used, while adding more than 20K sentence pairs of Europarl data seems to show no improvement. With further evaluation studies we can investigate whether or not a threshold exists for BLEU scores when the system is trained on the DVD subtitle corpus.
Table 1: Automatic Evaluation Results Amount of training sentence pairs (En-De) 10K 20K 30K 40K Type of Corpus Homogeneous Data Heterogeneous Data Homogeneous Data Heterogeneous Data Homogeneous Data Heterogeneous Data Homogeneous Data Heterogeneous Data BLEU Score 0.1082 0.0695 0.1166 0.0740 0.1195 0.0736 0.1287 0.0737


2006. Perspectives: Studies in Translatology. Volume 14: 3

5.3. Formative and Comparative Evaluation by Human The two types of human-based evaluation we conducted were of a formative and comparative nature and were used to assess whether or not changes made to our system were able to produce subtitles which are of a good enough quality for the end-user. Formative evaluation is designed to detect areas requiring improvement while the system is still under development. It may be carried out at different stages of the system development whereby making changes to the system and the new changes implemented are in turn rechecked. The comparative evaluation compares the performance between different MT systems in order to assess how the system under investigation fares against another MT system. Within the formative evaluation section we carried out a text-only evaluation with 6 participants and a text and image based evaluation with 6 participants evaluating DVD clips on a TV screen. Both evaluation strategies were used to improve the output of the EBMT system. For the comparative evaluation section we carried out an online survey, which combined a questionnaire with subtitled movie clips. A total of 12 German- speaking students took part, including both native and non-native speakers. 5.3.1 Formative Evaluation Techniques with Text-only Evaluation We used a training corpus of 30K sentence pairs and then input a test set of 2K English sentences into the EBMT system. Of the 2K German output sentences we randomly chose 200 for our evaluation purposes. The idea behind this was to make the evaluation completely objective, and not choose the ‘best’ 200 sentences from the output. However, this evaluation method was harsh as the subtitles were not accompanied by any images. Also the sentences were picked randomly out of context and there was no relation between the sentence before and the sentence after where each was an independent subtitle generated by our EBMT system. We then split up the 200 sentences into four groups of 50. We provided the human evaluators with the sentences, along with two scales, one for intelligibility and the other for accuracy. We adopted the scales from Wagner (1998), as she explains that these scales are useful for small-scale corpus-based research. The two scales used for the evaluation had scores ranging from 1 to 4, 1 being the best result. The evaluation indicated the main areas of weakness of our system. These included lexical errors, lack of capitalisation and verb agreement, English words remaining in the German output, and our chunking methods. The most negative comment came from one evaluator, stating that MT subtitles would never be of any use in any situation. While we were prepared for this type of feedback, given the particularly harsh conditions as explained earlier, it also raised the possibility that MT evaluation by a human can be affected by a negative attitude the evaluator may already have towards MT in general. On the other hand, positive comments from the evaluation included acceptable translations for short sentences and even creative renditions by the EBMT system when comparing some of the original human subtitles. Furthermore, there were many instances of minor grammatical errors only, scoring 2, which can easily be fixed by training the system further. Nevertheless evaluators noted that the subtitles would need post-editing if they were to reach a standard good enough to be shown on a commercial DVD. The errors which they pointed out were beneficial to us to

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


further develop the EBMT system in a way that would improve the quality of the output. 5.3.2 Formative Evaluation Techniques with Text and Image Evaluation In this evaluation, the participants watched a number of DVD clips with German subtitles produced by our EBMT system on a widescreen television in a dedicated lab11, assimilating a home theatre set-up where people are likely to watch DVD films. This session was then followed by a retrospective interview. The idea behind this type of formative evaluation is that we now introduce a relevant context to the process, providing the text (subtitles) with image and sound. These two extra media channels may influence the responses of the participants, as sound and image form part of the comprehension process when people watch a film with subtitles, rather than simply relying on reading them as standalone texts with no accompanying context. Six German native speakers participated in this evaluation. Three of the clips had English original soundtracks, and three had Japanese. The level of English language knowledge of the participants ranged from good to excellent, with one participant having some knowledge of the Japanese language, but not at a level to understand a film. The participants were informed in advance that the DVD subtitles were automatically generated by an EBMT system. Each clip lasted approximately 2 minutes, and each retrospective interview was recorded on cassette tape. There were ten sections in the interview, each containing an average of four questions. Two researchers conducting the evaluation session were present in the room throughout the viewing of the clips, followed by the interview. The results of this evaluation session were more promising than the text-only evaluation. During the retrospective interview we gathered some background information on the participants, indicating if they often watched films on DVDs with subtitles and how much they knew about translation technology and machine translation. Most participants watched subtitled films on DVDs three to four times a year, as most German films released in cinemas are dubbed. However, they all said that they much preferred subtitled films, because hearing the original soundtrack provided the viewer with a much better insight into cultural aspects of the film. None of the participants were familiar with the technologies we were using in the project, which perhaps would also influence their answers and ideas of the capabilities of MT. There was a general consensus among the evaluators that our EBMT subtitles with no post-editing would still benefit viewers if they did not understand the source language. It was also interesting to hear that with post-editing these subtitles could probably be used in certain public situations, for example in-flight movies, film festivals with the extremely short release time and a small budget to cover for the cost of subtitling, minority language scenarios and streaming videos. The participants were hesitant to say whether they would accept these EBMT subtitles with post-editing on a commercial DVD. We were correct in our pre-evaluation assumption that knowledge of the original source might have an influence on the participants’ answers. During the retrospective interview all of them were slightly more critical of mistakes in the English language clips as opposed to Japanese language clips, given that they had no knowledge of


2006. Perspectives: Studies in Translatology. Volume 14: 3

the source language in the latter. This pilot study was rewarding in providing us with an insight into real-user perceptions and also into evaluation strategies. This will be the stepping-stone from which a larger real-user study can be devised. 5.3.3 Comparative Technique with Online Survey including Film Clips Following the above evaluation session, we developed an online survey to carry out our third set of human evaluations, using the virtual learning environment Moodle12 which is implemented campus-wide at DCU. The idea behind devising an online survey was to access a wider audience and also to test the technical capability for Moodle to incorporate multimedia files and their access by the participants. This therefore formed a pilot trial for future larger scale online surveys which we hope to conduct. We asked native German speakers as well as non-native speakers to take part in the survey, with the final number of participants totalling 12. While that number was a little smaller than had been hoped for, this experience provided us with useful technical issues to consider in developing a large-scale online survey of this nature. The questionnaire first asked the participants to give some background information, such as how often they normally watch subtitled media. In this evaluation, participants were asked to look at film clips which incorporated German subtitles produced by EBMT for two films, namely The Bourne Identity and Harry Potter and the Prisoner of Azkaban. The approach taken was to prepare 3 different sets of subtitles for each film, making a total of 6 clips which the participants were asked to evaluate. The German subtitles used in the survey along with the corresponding original (intralingual) English subtitles are included in Appendix A. The first clip had raw EBMT subtitles from our system, the second had subtitles translated by the free online MT site Babelfish,13 and the third had post-edited EBMT subtitles from our system. We decided to include Babelfish output as a benchmark comparison for our MT system, in addition to human translation. Although the free version via Babelfish does not provide the full capability which might be available via its commercial version, it is perhaps the best-known freely available general purpose automatic translation system, and gives the reader a very good idea of the relative quality of our EBMT system.14 The third set of post-edited subtitles was included to see the acceptability from the end-user’s point of view. The editing was conducted by a native speaker of English with the knowledge of German within a pre-determined timeframe of 20 minutes to post-edit 38 subtitles. This was to test if non-native input could be used to improve the results. Figure 2 shows the results from our online survey. There are four charts (AD), indicating how many of the 12 respondents voted each of the 3 subtitle versions for the selected scenes from the two films as being acceptable for use in the given scenarios, namely on a purchased DVD, on a pirate DVD, on an in-flight film and on a streaming video. The participants were allowed to select multiple answers for which set of subtitles they would consider acceptable. For example, in (C) for the Harry Potter clip, 1 respondent regarded the raw EBMT output as acceptable quality for use on a in-flight film, 2 considered raw Babelfish output acceptable, 9 considered post-edited EBMT acceptable while 3 respondents re-

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


garded none of the three versions acceptable for this scenario. From the charts in Figure 2, it is very positive to see that high numbers of people would accept post-edited EBMT subtitles, on all four types of media for both clips shown, and in contrast to this there were a lower number of responses indicating that they would accept none of the subtitles offered, especially in the case of Harry Potter subtitles. Overall the Harry Potter subtitles were accepted more often than The Bourne Identity subtitles. This is probably due to the type of clip selected. The Bourne Identity is an action film, and therefore the camera changes are more frequent, together with more interjections from various people within one scene. On the other hand, Harry Potter tends to focus on its main characters, not making camera changes as frequently and as suddenly as in The Bourne Identity, and allowing them to finish their sentences without being interrupted. This perhaps has repercussions for the types of film to which automated subtitles are, in general, suited.
Figure 2: Online Survey Results


2006. Perspectives: Studies in Translatology. Volume 14: 3

Of the 12 responses, 9 said they would purchase a commercial DVD containing the post-edited EBMT subtitles, based on the short Harry Potter clip, and 6 people said they would purchase a DVD based on the sample of post-edited subtitles on The Bourne Identity clip. Most responses regarding the post-edited clips were very supportive, with these subtitles being strongly accepted in all four scenarios. The participants liked the fact that the post-edited EBMT subtitles displayed good colloquial phrases, correct subject-verb agreement, the translation of tone and register was always correct, and it was commented that the subtitles ‘felt like German’. The raw EBMT output was not considered in a favourable manner, with the main complaint being the lack of capitalised nouns15. This is something which is relatively easily fixed in the post-editing phrase, particularly given that our ultimate aim is to use EBMT as an integral tool for a human subtitler. By contrast, the Babelfish translations received positive feedback in relation to nouns being capitalised. Nonetheless, some of the subtitles contained blatant lexical and grammatical errors, which were explicitly marked down by all participants. The Babelfish subtitles were also heavily criticised for being translated too literally, and in some cases the incorrect register was used. Nevertheless

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


Babelfish did score better than our raw EBMT output in all subtitle scenarios. The feedback provided in narratives from the participants correlated well with the figures shown in each chart, and these results provided us with concrete evidence as to where EBMT is failing as compared with RBMT, which we could use for our future studies. The overall improved positive responses confirm our research direction and the importance of human-based user feedback which can be obtained in narrative form pinpointing the nature of the problem albeit by way of blackbox evaluation. It also points to the need to continue developing further evaluation techniques, which elicit the shortcomings of the system under study. 6. Conclusions In this paper we presented a year-long proof of concept study undertaken with the main objective being to build and test the feasibility of an EBMT system to translate subtitles from English into German and Japanese for the DVD market. We mainly focused on our primary system developed for English-German. We also had a secondary objective to develop a holistic evaluation methodology combining automatic metrics popular in the MT community such as BLEU with a variety of human assessment methods. The BLEU scores provided a quantitative measure to indicate the preference of the data types with which the EBMT should be seeded. It indicated that the EBMT trained with homogeneous data is likely to contribute to a higher translation quality than with heterogeneous data. This in turn suggested that there are probably more similarities found among subtitles than between a subtitle and a sentence from a more general text type. The human evaluation was, therefore, applied to the system only focusing on the homogeneous data. As predicted, the text-only evaluation with randomly selected text strings shown to the evaluators without the audiovisual context was generally regarded as poor. This was also the first of our human-based formative evaluations and the corpus size was the smallest of the three human evaluations conducted. The next human evaluation indicated an improvement as the system had been adjusted based on the first set of feedback from the human evaluators. This points to the importance of human feedback in order to elaborate the system’s shortcomings. The third human evaluation included comparative evaluations between three different translations created from the same source text. The raw output from our system scored the worst, behind the output from the well-known online MT system Babelfish. This is understandable, however, given the time and resources invested in this long-standing system. It was encouraging to see that human evaluators indicated the post-edited EBMT output to be acceptable to be used for subtitles for commercial audiovisual content despite the fact that the post-editing was performed by a nonnative speaker. These results indicated that the EBMT paradigm is feasible as a CAT tool. Further, it may be possible, for instance, for such a system to be used by professional translators to produce subtitles into their non-native languages. The project team consisted of humanities researchers specialising in multimedia translation, corpus linguistics and computing researchers specialising in EBMT. Due to this combined expertise, we were able to achieve during the


2006. Perspectives: Studies in Translatology. Volume 14: 3

relatively short timeframe the research objectives of building a working MT system producing German subtitles from English on the basis of parallel corpora of two different types. We were also able to explore holistic evaluation methods by testing both automatic and human-based approaches. 7. Future Work The time-consuming nature of corpus building with subtitle data was something we had slightly underestimated. In an industrial setting, in order to implement a system such as ours it will require a much more efficient way of harvesting the data without compromising its quality. This suggests a great need for co-operation with film distributors who may have subtitle data in electronic form. As alluded to before, copyright issues are something which need to be addressed for commercialisation of our concepts tested in this project. As a postproject development, we hope to see our work being continued to integrate the MT component into an existing subtitling environment and measure any improvement in subtitler throughput. We also hope to further refine the holistic evaluation methods, by expanding the online survey platform to reach a wider audience. Further, we hope to experiment with our eye-tracking equipment in the research lab to explore the difference in cognitive load of the viewer when watching the film with the machine-translated subtitles as compared to humantranslated subtitles, inspired by the work conducted by O’Brien (2006). Finally, on the basis of the subtitle parallel corpus we have created for EnglishGerman and English-Japanese, we hope to pursue our search for patterns of repetitions and similarities in a more microscopic manner in this text type from the perspective of the EBMT paradigm.
Works cited Armstrong, S. 2007. Using EBMT to Produce Foreign Language Subtitle. MSc. Thesis, Dublin City University, Dublin, Ireland. Armstrong, S., Caffrey, C., Flanagan, M., Kenny, D., O’Hagan, M., and Way, A. 2006a. Improving the Quality of Automated DVD Subtitles via Example-Based Machine Translation. In Translating and the Computer. London: Aslib. Armstrong, S., Flanagan, M., Graham, Y., Groves, D., Mellebeek, B., Morrissey, S., Stroppa, N. and Way, A. 2006b. MaTrEx: Machine Translation Using Examples. TC-STAR OpenLab Workshop on Speech Translation. Trento, Italy (available at: www.computing. Banerjee, S. and Lavie, A. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, MI., 65-72. Bowker, L. and Pearson, J. 2002. Working with Specialized Language: A practical guide to using corpora. London and New York: Routledge. Brown, R. 1999. Adding Linguistic Knowledge to a Lexical Example-Based Translation System. In Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99), Chester, UK, 22-32. Carroll, M. 2004. Subtitling: Changing Standards for New Media: [Accessed November 2006]. Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings ARPA Workshop on Human Language Technology, San Diego, CA., 128-132. Gough, N., and Way, A. 2004. Robust Large-Scale EBMT with Marker-Based Segmentation. In Proceedings of the Tenth Conference on Theoretical and Methodological Issues in Machine Translation (TMI-04), Baltimore, MD., 95-104.

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


Green, T. 1979. The Necessity of Syntax Markers. Two Experiments with Artificial languages. Journal of Verbal Learning and Behavior 18: 481-496. Hearne, M. and Way, A. 2006. Disambiguation Strategies for Data-Oriented Translation. In Proceedings of the 11th Conference of the European Association for Machine Translation, Oslo, Norway, 59-68. Koehn, P. 2005. A Parallel Corpus for Statistical Machine Translation. Machine Translation Summit X, Phuket, Thailand. 79-86. Morrissey, S. and Way, A. 2005. An Example-Based Approach to Translating Sign Language. In Proceedings of the Second Workshop on Example-Based Machine Translation, Phuket, Thailand, 109-116. Morrissey, S. and Way, A. 2006. Lost in Translation: the Problems of Using Mainstream MT Evaluation Metrics for Sign Language Translation. In Proceedings of the SALTMIL Workshop on Minority Languages, 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, 91-98. Nagao, M. 1984. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In A. Elithorn and R. Banerji, (eds.) Artificial and Human Intelligence, North-Holland, Amsterdam, The Netherlands: Elsevier Science Publication. 173-180. NHK Annual Report. 1996. [Accessed 28 November 2006]. O’Brien, S. 2006. Investigating Translation From an Eye-Tracking Perspective. A paper given at the 2nd International Association for Translation and Intercultural Studies Conference: Intervention in Translation, Interpreting and Intercultural Encounters, held at the University of the Western Cape, South Africa, 11-14 July, 2006. Och, F. and Ney, H. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 29(1): 19-51. O’Hagan, M. 2003. Can language technology respond to the subtitler’s dilemma? – A preliminary study. In Translating and the Computer 25. London: Aslib. Papineni, K., Roukos, S., Ward, Y. and Zhu, W-J. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA., 311-318. Snover, M., Dorr, B., Schwartz, R., Makhoul, J. and Micciulla, L. 2006. A Study of Translation Error Rate with Targeted Human Annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Boston, MA., 223-231. Somers, H. 1999. Review Article: Example-based Machine Translation. Machine Translation 14: 113-157 (revised, extended version in Carl., M. and Way, A. (eds.) (2003), Recent Advances in Example-Based Machine Translation, Kluwer Academic Publishers, Dordrecht, The Netherlands, 3-59). Somers, H., McLean, I. And Jones, D. 1994. Experiments in Multilingual Example-based Generation. CSNLP 1994: 3rd Conference on the Cognitive Science of Natural Language Processing, Dublin City University, 6-8 July 1994. Stroppa, N., Groves, D., Sarasola, K., and Way, A. 2006. Example-Based Machine Translation of the Basque Language. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Boston, MA., 232-241. Stroppa, N. and Way, A. 2006. MaTrEx: DCU Machine Translation System for IWSLT 2006. In Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan. Taylor, C. 2006a. I knew he’d say that!” A consideration of the predictability of language use in film. A paper presented at the “Multidimensional Translation: Audiovisual Translation Scenarios” conference, University of Copenhagen, 1-5 May, 2006, Copenhagen, Denmark. Taylor, C. 2006b. The Language of Television Series: a Study of Predictable Patterns. A paper presented at the “Languages & the Media” conference, 25-27 October, 2006, Berlin, Germany. Toole, J., Turcato, D., Popowich, F., Fass, D. and McFetridge, P. 1998. Time-constrained machine translation. In: Farwell, D., Gerber, L. and Hovy, E. (eds.) Machine translation and the information soup: third conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA. Proceedings (Berlin: Springer), 103-112.


2006. Perspectives: Studies in Translatology. Volume 14: 3

Turian, J., Shen, L. and Melamed, D. 2003. Evaluation of Machine Translation and its Evaluation. Machine Translation Summit IX, New Orleans, LA., 386-393. Utiyama, M. and Isahara, H. 2003. Reliable Measures for Aligning Japanese-English News Articles and Sentences. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, 72-79. van den Bosch, A., Stroppa, N. and Way, A. 2007. A memory-based classification approach to marker-based EBMT. In Proceedings of the METIS-II Workshop on New Approaches to Machine Translation, Leuven, Belgium (to appear). Wagner, S. 1998. Small Scale Evaluation Methods. In: R. Nübel & U. Seewald-Heeg (eds.) Evaluation of the Linguistic Performance of Machine Translation Systems. Proceedings of the Workshop at KONVENS-98. Bonn, Germany, 93-105. Watanabe, H., Kurohashi, S. and Aramaki, E. 2003. Finding Translation Patterns from Paired Source and Target Dependency Structures. In Carl., M. and Way, A. (eds.) (2003), Recent Advances in Example-Based Machine Translation, Kluwer Academic Publishers, Dordrecht, The Netherlands, 397-420. Multimedia References As Good as it Gets (1997). [DVD]. USA: TriStar Pictures. Being John Malkovich (1999). [DVD]. USA: Universal Studios. Breakfast at Tiffany’s (1961). [DVD]. USA: Paramount Pictures. Casablanca (1942). [DVD]. USA: Time Warner. Dr Strangelove (1964). [DVD]. UK: Hawk Films Ltd. Harry Potter and the Prisoner of Azkaban (2004). [DVD]. USA: Time Warner. The Bourne Identity (2002). [DVD]. USA: Universal Studios. Notes 1. This work was generously supported by an Enterprise Ireland Proof of Concept Commercialization award. 2. Examples include the international conference in audiovisual translation In So Many Words: Language Transfer on the Screen, held in February 2004 in London; Languages and the Media conferences held in October 2004 and 2006 in Berlin; EU High Level Scientific Conference series: Multidimensional Translation held in Saarbrücken in May 2005 and in Copenhagen in May 2006. 3. www/ 4. 5. 6. The MaTrEx system currently translates between English and a number of languages, including French (Gough & Way 2004), Spanish (Armstrong et al. 2006b), German and Japanese (Armstrong et al. 2006a), Italian and Arabic (Stroppa & Way 2006), Basque (Stroppa et al. 2006), Irish Sign Language (Morrissey & Way 2005, 2006), and Dutch (van den Bosch et al. 2007). 7. For example, does not deliver these region-protected DVDs outside of Japan. 8. 9. This study indicated an average of 11 characters per sentence for Japanese subtitles for the Lord of the Rings: The Fellowship of the Ring, against 31 characters per sentence for the Japanese translation of the book 1 of the Lord of the Rings. 10. 11. The Advanced Translation Research Lab was recently established at SALIS, DCU, with audiovisual equipment such as a 32” widescreen TV, and a DVD recorder/player. The room can also be used for small group interviews. 12. 13. 14. Note that this is not a completely fair comparison, as our system is trained on material similar to that with which it is tested, while Babelfish receives no such customisation, of course. 15. In the German language, all nouns are capitalised. The production of lower-case nouns by the EBMT system is described in section 4.2.

Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example.


Appendix A: MT German Subtitles used in the Survey
Harry Potter and the Prisoner of Azkaban Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Oh for goodness sake oh wie sie goodness’ himmels willen. OH für Grund der Güte. Oh für liebe Güte. Brilliant! genial Leuchtend Toll! It’s not funny das ist nicht witzig. Sie ist nicht lustig. Das ist nicht witzig. I think I was lucky not to be arrested ich war ein glück nicht die arrested. Ich war glücklich nicht festgehalten zu werden. Ich hatte Glück nicht eingesperrt zu werden. Ouch Ron, that was my foot ouch, ron, das war mein zu fuß. Autsch Ron, der war mein Fuß. Au Ron, das war mein Fuß. Why are we stopping? warum halten wir an ? Warum wir stoppend sind. Warum halten wir? There’s something moving out there da draußen bewegt sich irgendwas. Es gibt etwas, das heraus dort bewegt. Etwas bewegt draußen. I think someone’s come on board. ich glaub da steigt jemand ein. Ich denke jemand, das an Bord kommt. Ich glaube irgendjemand kommt an Bord.


2006. Perspectives: Studies in Translatology. Volume 14: 3 I got enough trouble, Ok. ich hab und ein problem. Ich erhielt genügend Mühe. Ich habe genug Probleme, ja? Ok, I’m up. also,ich bin zu. O.K,bin ich oben. Also,ich bin bereit. It’s no scam. Nein, das ist ja scam . Nr., ist es kein scam. Nein, das ist keine Masche. Is it him? sie auf dem ? Ist es er? Ist das er? I heard you inside. ich hab sie dich . Ich hörte Sie nach innen. Ich hörte Sie rein. It’s that simple. ich mir einfach . Er ist einfacher der. Es ist so einfach. And I’ll give you another $10,000 when we get there. ich gebe dir einen $10 . 000 als wir kommen sehen. Ich gebe Ihnen eine anderen $10.000, wenn wir ankommen Wenn wir ankommen, gebe ich dir noch $10,000.

The Bourne Identity Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT Original English Subtitle Raw EBMT Babelfish Post-edited EBMT

Sign up to vote on this title
UsefulNot useful