Translation Arrays: Version Variation Visualization (Phase 2

)

Translation Arrays: Version Variation Visualization (Phase 2)
Tom Cheesman, Robert S Laramee, Jonathan Hope
Executive Summary
We have built the world’s first prototype online platform for comparing multiple versions of literary works, in one or more languages, using visual representations of statistical text analyses: www.delightedbeauty.org/vvv. The site demonstrates innovative, fast and intuitive navigation interfaces and analysis tools, with game-like features. It was launched in September 2012, presenting 37 German versions of Shakespeare’s Othello as an experimental corpus. It is already being used by other international research teams working on translations between various languages, and has excited considerable public interest. The site is built on a transactional data services layer and a flexible tool for defining, marking up and aligning text segments between versions. The experimental visual interfaces open new research vistas, while their game-like design principles also appeal to non-research users, in education, creative industries, and general consumers of great literary works. The transformative potential is enormous. Presented using digital tools, multiple varying versions of texts in multiple languages afford exciting opportunities to generate and share new cultural knowledge and understandings within and across language barriers. Future work will acquire more text corpora, experiment with audio-visual corpora, explore further analytic and visualization approaches, and develop the interactivity of the platform for research and other uses.

Researchers and Project Partners
PI: Dr Tom Cheesman Co-I: Dr Robert Laramee Co-I: Dr Jonathan Hope RA: Kevin Flanagan Consultant designer: Stephan Thiel (Studio Nand, Berlin) Industry partner: ABBYY Software (contact: Colin Miller)

Translation Arrays: Version Variation Visualization (Phase 2)

Summary report
“If funded properly it could be a great tool for scholars interested in transnational studies and the history of global culture.” “The project is hugely ambitious and, if fruitful, will be important.” – DH2012 Conference proposal reviewers “VVV is a very promising prototype of a very versatile tool for comparing multiple versions of any texts, be it original(s) and translation(s) or different (authorial, editorial) versions of the same text. The alreadypublished version of the Othello fragment is ample proof of that, and the platform used can and should be developed for more functionalities (automatic alignment, more difference measures, user interface).” – Dr Jan Rybicki (Krakow), pers. corr. “Compare translated versions of Othello with Swansea University’s fun, geeky app.” – Folio Online translation agency, Cape Town, SA, sharing link [11] on Facebook, 21/09/12 In analogue media, multiple text versions – and still more, translations – present intractable problems for analysis and presentation. But in digital media, variant versions afford exciting transformative potential for collaborative intercultural work. We proposed a web platform demonstrating multiple translation comparison and analysis through visualizations, envisaging users in research, education, creative industries, and cultural consumers. We wanted to show that this approach could benefit non-linguists. Our prototype at www.delightedbeauty.org/vvv features (1) a robust transactional corpus data services layer (‘Ebla’), (2) a flexible stand-off mark-up toolsuite (‘Prism’), enabling users to freely define and semi-automatically align segments in parallel documents, and (3) several proof-of-concept visual interface tools for navigation and exploration. The tools are demonstrated on a corpus of 37 German versions of Othello, with contextual data supplied for each. “The most intriguing tool” (Solon, Wired) allows users to identify how much translators’ responses to a text diverge, segment by segment; and then view diverse segment translations, bilingually, with ©Google back-translations. This tool deploys experimental algorithmic analyses of lexical divergence among segment translations: ‘Eddy values’ measure the distinctiveness of each segment translation compared to others; ‘Viv values’ for each base text segment

Translation Arrays: Version Variation Visualization (Phase 2)

are aggregated from all the associated ‘Eddy values’, and are displayed on the base text as a varying colour underlay. This is a powerful tool for reading ‘great works’ in a radically new way, through the ‘lens’ of arrayed translations. It opens new horizons in historical and contemporary cultural enquiry. Viv and Eddy values identify ‘hotspots’ of translation controversy and shifts in translation-as-interpretation. Typical and atypical translations can be compared, translation histories tracked. Translation variation becomes a research instrument for global intercultural studies. Other interfaces offer macro views of variation in document structures (what do editors/adaptors omit, condense, what do they add?), and make reading parallel texts easy, by deploying segment attributes and alignments for flexible navigation. We also installed a basic facility to generate graphs: ‘Eddy histories’ for segments, or averaged for all segments in versions, visualize large-scale trends in translation history. Further visual analyses have been developed in as-yet offline work ([6]; Visual Evidence). These tools deploy ‘Eddy and Viv’ algorithms with ‘classic’ scientific DV methods – cluster scatterplots, parallel co-ordinates, lexis heat maps – affording greater control over comparison scales and parameters. This work is to be integrated into the online platform. Anonymous reviewers of our proposal for the DH2012 (Hamburg, July) conference graded our proposal on average over 90%. Media coverage following our 11 September launch event at Shakespeare’s Globe Theatre reflects public interest: detailed reports in Wired magazine (Solon) and BBC Wales online and the BBC Technology site (Dermody), as well as an earlier report in the Western Mail (Turner), all had many online responses. Dr Jan Rybicki (Jagiellonian U, Krakow), a computational stylistician specialising in translations ([21]), began using our tools immediately after the launch for research on Polish-English translations. His ongoing feedback is very useful. Prof. Roberta Rego Rodrigues (U Federal Minas Gerais, Brazil) recently began using the tools, analysing Portuguese versions of a Katherine Mansfield story. Collaborations ‘Bridging the Gaps’ (BTG) (ESPRC/Swansea U) funded work in May-August by Zhao Geng, a Data Visualization PhD student supervised by Co-I Laramee. This work is documented in co-authored papers ([4-6]). BTG also met additional text use license costs. ABBYY – the world leader in digitization software – generously provided a free trial of their Recognition Server OCR product, enabling us to process many more versions of Othello than expected. Their PR agency AxiCom worked hard to alert culture and technology journalists to our launch event: hence Solon’s excellent article in Wired, and an ongoing conversation with Hannah Freeman, The Guardian’s community co-ordinator for culture.

Translation Arrays: Version Variation Visualization (Phase 2)

Our presentation at the DH2012 conference led to many new contacts: Rybicki, now a key collaborator; the SAWS team at KCL (their work on ontologies [23] is very pertinent); the CATMA team under our project advisor Prof Chris Meister [19]; the Leipzig E-Humanities team focusing on ‘text re-use’ (we present there on Oct 24: [3]); the Project Yao team ([17]); the Text Grid team (DH infrastructure: Göttingen; [24]); teams working on the history of Chinese Buddhist scriptures (Hamburg/Tokyo/Taiwan). Project evolution We have always considered this work relevant to all three DT themes, with transformative implications which will grow with the project’s assets: “Text: Authority and Power”: rendering visible the plurality of versions and translations of a work and their explicit and implicit interrelations, an all-but impossible task in non-digital media, deconstructs notions of the singular text, exposes cultural histories, re-configures cultural memories, and interrogates intra- and cross-cultural power relations. “The Creative & Performing Arts and Technology”: accessibility and analytic comparability of previous versions transforms conditions for making new versions, including performance and media productions. “Translating Knowledge”: central to our work are processes of ‘translation’ from analogue to digital media, from restricted to open public access, from absence to presence. Previously, 5 German versions of Othello were online (2 of them machine-unreadable), and 5 in print. Now, 37 German versions (of approx 10% of the play) are globally accessible. Scaling up this approach (further works/languages) will transform the conditions of global public knowledge of literary works, and of public understanding of linguistic-literary translation as a knowledge re-making, creative activity, with deep historical and fully planetary extensions. Translation interprets the world and knowledge of it. We are translating translation into the digital world. Lessons learnt Comparing many translations might seem of little interest beyond the field of ‘translation corpus studies’. But public interest has exceeded our expectations. We see three factors at work. The name of Shakespeare is a key ‘draw’ – to be matched in future by the bible. Another is Thiel’s design vocabulary, surrounding a familiar-looking text with intuitively usable, game-like navigation and exploration tools. Thirdly, UK journalists report our work in the context of concerns over neglect of modern languages. This encourages us to focus some future work on educational as well as arts practice-related functionalities, alongside research uses. Translation genealogy has emerged as a focus of many researchers’ interest: identifying translations’ source editions, reconstructing relations of dependency and innovation, attribution studies, and historical stylistics. The as-yet offline

Translation Arrays: Version Variation Visualization (Phase 2)

tools by Zhao and Laramee, identifying similarity clusters at various scales, enable detailed work of this kind [6]. Two critical points made by some users: 1. In the prototype, versions are aligned to a singular ‘base text’ – a pragmatic navigational tool. This occludes the complex flow of original-language version variation in time and space. Nothing but resources prevents us from visualising variant editions (also annotations, paratexts, etc.), ‘retellings’, modern English cribs, cross-media adaptations, re-scriptings for performances, etc. Monolingual work-histories can be explored as histories of ‘(re)translation’. 2. Documents in our Othello corpus are discursively contextualised, but only names and dates are navigational. With an ontology appropriate to part-parallel corpora as open networks (with explicit and implicit nodes), and Natural Language Processing tools, we can enable interoperability with other datasets in a Semantic Web framework, creating exploration and navigation tools which link project assets with external assets. Legal IP issues were less problematic than feared. Many publishers allowed free use of texts, others requested a fee (<150EUR) and/or limitations on access (>10%). Nevertheless, supplementary funding (BTG) was necessary here: this cost and time factor must not be underestimated. At DH2012 we witnessed a historic rapprochement between the DH subcommunities of text encoders (TEI markup, edition-making) and text analysts (stylometry, data-extraction) ([13]). Our work caters more to the latter camp. Our future work will cater to both. Future plans Swansea University’s Research Institute for Arts and Humanities has granted £7,850 for continuation work to prepare a Themes Large Grant application in January 2013. The Digital Transformations call precludes projects with a major ‘resource enhancement’ component. If this covers the large-scale acquisition/digitization of texts of varied provenance which VVV development requires, we may apply under ‘Translating Cultures’. Goals: (1) develop VVV functionalities for research, educational and creative uses – greater accessibility, user interactivity, broader document type import/export (with TEI encoding), segmenting/aligning automation, page images, NLP tools, R and GoogleViz interfaces, more various more customizable visualizations, formalised ontology for metadata and contextual data, semantic web links, multimedia corpora, social media affordances, user creation of multimedia research-based educational presentations, smartphone capability, exhibitionscale outputs, etc. (2) enhance VVV assets – (1) acquire existing digital corpora: bible corpora (cf. Digital Bible Society [14], YouVersion.com [16]); other part-parallel corpora,

Translation Arrays: Version Variation Visualization (Phase 2)

e.g. versions and translations extracted from corpora such as Co-I Hope works on: Early English Books Online [20] or Google Books; (2) acquire global Shakespeare versions (editions and translations of <10 selected works: books, typescripts, manuscripts, <2,500 texts, <10 languages), collaborating with BL, the Folger Institute (Hope’s co-researchers), the European Shakespeare Research Association (through project advisor Prof Delabastita), international Shakespeare research/translation/edition teams (so far confirmed: teams in Korea, Poland, and Spain), and researchers at MIT (Global Shakespeares project: performance videos online; educational multimedia for classrooms [15a/b]). (Every year, 2 million Chinese highschool students, among others worldwide, study the trial scene in The Merchant of Venice [26]. This is a priority. Multilingual, multiversion arrays of such works, with both expert commissioned presentations and social media affordances, will be powerful critical intercultural educational tools.) (3) acquire all English versions of select foreign-language works e.g. Dante’s Commedia, the Bhagavad Gita, the Analects of Confucius. (3) create a global observatory of translations. The manually-input UNESCO ‘Index Translationum’ database ([25]) cannot cope with vastly increased translating activity. We are discussing with the director, Marius Tukaj, an app to visualize data scraped from WorldCat, national libraries, Nielsen, IMDB, etc. (Cf. the migration flow DVs of the Max Planck Institute for the Study of Diversity [17].) The current team [Cheesman (concepts, demo text preparation/curation), Laramee (DV, Co-I), Hope (corpus linguistics, Co-I), Flanagan (‘back-end’ software architecture, RA), and Thiel (graphical interfaces, consultant)] has an excellent complementary skill-set and strong long-term commitment. Extraacademic partnerships (ABBYY, Globe) are durable. We are in discussion with partners mentioned. One future Co-I is Dr Tim Hutchings (Durham), currently a CRESC Fellow working on digital bibles: valuable for VVV work, e.g. training machine analyses, testing tools, and achieving impact. We seek a Co-I in educational applications, via Prof Michael Kelly (Soton; Speak to the Future). ABBYY, with contracts for international public digitization projects, can facilitate conversations with libraries to ‘tweak’ digitization strategies; also a topic of discussions with Prof Lorna Hughes (Aberystwyth/NLW). Hughes, Hutchings, Rybicki and project advisor Prof Susan Schreibman (TCD; [20]) will attend a December application planning workshop in Swansea. Outreach Two key dissemination events: team presentations at DH2012 (Hamburg, 20 July), and at Shakespeare’s Globe Theatre, London, Education and Research (11 Sept afternoon). The latter drew 16 people representing ABBYY, AxiCom, The Guardian, BBC, KCL, UCL, Shakespeare Institute (Birmingham U), Sunderland U.

Translation Arrays: Version Variation Visualization (Phase 2)

Reports on the BBC and Wired were widely disseminated online (see [10-11]). We featured on Swansea University’s front page in September. The university’s PR team’s tweets were re-tweeted e.g. by the RSC. A video is being made with Cheesman and Laramee for the university’s ‘Breakthrough’ research showcase. Cheesman was interviewed for a widely-read translators’ blog [12]. Forthcoming presentations on VVV: October: Alexander-von-HumboldtAssociation (London) (Cheesman); E-Humanities Seminar, Leipzig (chair in DH: Prof Gregory Crane) (Cheesman, Thiel); November: German Studies seminar, Birmingham (Cheesman); Dept of Translation and Intercultural Studies, Manchester (Prof Mona Baker) (Cheesman, Flanagan).

Translation Arrays: Version Variation Visualization (Phase 2)

References and external links
A. TEAM OUTPUTS Prototype ‘Translation Array’ with VVV tools at: www.delightedbeauty.org/vvv (public full-access installation) and at: www.delightedbeauty.org/vvvclosed (research installation: the public can view but not edit data) [1]. Cheesman, Tom, and the VVV team: ‘Translation Sorting: Eddy and Viv in Translation Arrays’, forthcoming chapter, under review, in conference volume: Un/Translatables, ed. Bethany Wiggin, Northwestern UP, 2013. [Summer 2011 write-up of May 2011 conference paper. Draft accessible at: http://www.scribd.com/doc/101114673/Eddy-and-Viv.] [2]. Cheesman, T., Geng, Z., Laramee, R.S., Flanagan, K., Thiel, S., Hope, J., Ehrmann, A. (2012) ‘Translation Arrays: Exploring Cultural Heritage Texts Across Languages’. Conference abstract. At http://www.dh2012.unihamburg.de/conference/programme/abstracts/translation-arraysexploring-cultural-heritage-texts-across-languages/ [3]. Cheesman, T., Geng, Z., Flanagan, K., Thiel, S. (2012), ‘Exploring Multiple Versions of Cultural Heritage Texts Across Languages’. Leipzig EHumanities Seminar abstract. At http://www.ehumanities.net/assets/seminar/2012/seminar%20on%20oct%2024th.pdf [4]. Geng, Z., Laramee, R.S., Cheesman, T., Berry, D.M., Ehrmann, A., ‘Visualizing Translation Variation: Shakespeare’s Othello’, Advances in Visual Computing: Lecture Notes in Computer Science, vol. 6938, 2011, pp. 653 – 663. [5]. Geng, Z., Laramee, R.S., Cheesman, T., Rothwell, A., Berry, D.M., Ehrmann, A., ‘Visualizing Translation Variation of Othello: A Survey of Text Visualization and Analysis Tools’, Literary and Linguistic Computing (paper accepted with revisions, scheduled 2013). [6]. Geng, Z., Laramee, R.S., Cheesman, T., Flanagan, K., Thiel, S., ‘Visual Analysis of Segment Variation of German Translations of Shakespeare’s Othello’, Information Visualization (under review, est. 2013) [paper, video and slides are at: http://cs.swansea.ac.uk/~cszg/vv] [7]. Wilson, M.L, Cheesman, T., Geng, Z., Laramee, R.S., Rothwell, A., Berry, D.M., Ehrmann, A. ,’Studying Variations in Culture and Literature: Visualizing Translation Variations in Shakespeare’s Othello’, WebSci Conference 2011, poster at: http://www.websci11.org/fileadmin/websci/Posters/152_paper.pdf B. PROJECT PUBLICITY [date order] [8]. 2012-05-18: Turner, Robin (BBC Wales Online): ’Welsh scientists use Shakespeare to unlock the true art of translation’, http://www.walesonline.co.uk/news/need-to-read/2012/05/18/welshscientists-use-shakespeare-to-unlock-the-true-art-of-translation-9146630999549/#ixzz1vsMflGuY [9]. 2012-05-18: Turner, Robin (Western Mail), ’How Shakespeare is helping to show how world cultures differ’ [variant print version] [10]. 2012-09-13: Dermody, Nick (BBC Wales Online), ‘World writings compared by Swansea University web tool’, http://www.bbc.co.uk/news/uk-walessouth-west-wales-19561879 [This story also ran on BBC News

Translation Arrays: Version Variation Visualization (Phase 2)

Technology, on BristolWired, GlasgowWired, ManchesterWired, was flagged by NewsTech24, etc] [11]. 2012-09-13: Solon, Olivia (Wired.co.uk), ‘Linguistics tool simultaneously compares multiple translations of Othello’, http://www.wired.co.uk/news/archive/2012-09/13/shakespearetranslation-comparison [also on e.g. njuice.com, pinterest.com, shakespearean.tumblr.com, www.scoop.it, www.fragmentarytexts.org etc. and many translation industry sites, e.g. k-international.com, www.facebook.com/folioonline] [12]. 2012-10-13: Epstein, Brett, ‘Delighted Beauty: An Interview with Tom Cheesman’, Brave New Words: a blog about translation, http://bravenew-words.blogspot.co.uk/ C. OTHER [13]. Bauman, Syd, et al., ‘Text Analysis Meets Text Encoding’ [DH2012 conference abstract], at http://www.dh2012.unihamburg.de/conference/programme/abstracts/text-analysis-meets-textencoding/ [14]. Digital Bible Society (2009-), http://digitalbiblesociety.com/DBS2010/english/index.htm [15a]. Donaldson, P. et al. (2010-), ‘Global Shakespeares Video & Performance Archive’, http://globalshakespeares.mit.edu/ [15b]. Donaldson, P. et al. (2005-), ‘Cross Media Annotation System (XMAS)’, http://mit.edu/shakspere/xmas/ [16]. Lifechurch.tv (2011-), ‘YouVersion’, http://www.youversion.com [17]. Lockard, Joe, et al. (2003-), ‘Project Yao: A free, accessible database of American literature translations into Chinese’, http://yao.eserver.org/ [18]. Max Planck Institute for the Study of Religious and Ethnic Diversity (2012), ‘Interactive Data Graphics’, http://media.mmg.mpg.de/# [19]. Meister, Jan-Christoph, et al. (2009-), ‘CATMA 3.2: Computer Aided Textual Markup & Analysis’, http://catma.de/ [20]. ProQuest LLC, ‘Early English Books Online’ (2003-), http://eebo.chadwyck.com/home [21]. Rybicki, J. (2012). ’The great mystery of the (almost) invisible translator: stylometry in translation‘. In M. Oakley and M. Ji (eds.),Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins, pp. 231-248 [22]. Schreibman, Susan, et al. (2010), ‘Versioning Machine 4.0’, http://vmachine.org [23]. Sharing Ancient Wisdom project, KCL (2012), ‘SAWS Ontology (Owl)’, http://www.ancientwisdoms.ac.uk/media/ [24]. TextGrid (2012), ‘TextGrid – Virtuelle Forschungsumgebung für die Geisteswissenschaften’, http://www.textgrid.de/ [25]. UNESCO, ‘Index Translationum - World Bibliography of Translation’ (1932-), http://portal.unesco.org/culture/en/ev.phpURL_ID=7810&URL_DO=DO_TOPIC&URL_SECTION=201.html [26]. World Shakespeare Festival (2011), ‘Shakespeare: A worldwide classroom. New findings on teaching Shakespeare round the world. An international survey commissioned by the RSC and British Council’, http://www.worldshakespearefestival.org.uk/wiki/Survey-results.ashx

Translation Arrays: Version Variation Visualization (Phase 2)

Digital Transformations
Digital Transformations is one of the AHRC’s Strategic Themes, which were Identified through the Future Directions for Arts and Humanities Research Consultation in 2009. The themes provide a funding focus for emerging areas of interest to arts and humanities researchers. Professor Andrew Prescott, AHRC Digital Transformations Theme Leadership Fellow, has said: “The AHRC Digital Transformations theme is about more than the creation of online editions or the digitisation of books, manuscripts or pictures. It is about fostering completely new methods of scholarly research and discourse. It will encourage arts and humanities researchers to work with scientists in developing new concepts for digital technologies to explore our artistic and cultural heritage. It will show how the theoretical insights generated by the arts and humanities enable us to better understand the profound changes currently occurring in identity, culture and society. Researchers in the arts and humanities will create new relationships with creative and cultural businesses, memory institutions and technology producers. The digital has already profoundly transformed the arts and humanities; the AHRC Digital Transformations theme will show how the arts and humanities can transform digital cultures.” Further details about the theme can be found on the AHRC’s Digital Transformations web pages at: http://www.ahrc.ac.uk/FundingOpportunities/Pages/digitaltransformations.aspx