Translation Sorting with Eddy and Viv1 Tom Cheesman (Swansea) and the VVV Project Team2

For Eddy Cheesman

Figure 1: Interface mockup: Eddy analysis applied to translations; Viv analysis applied to the source text

1.1. This paper presents a formula to define in mathematical terms the relative distinctiveness of a translation, in relation to other translations, in the same language, of the same source text. This formula is given the name ‘Eddy’. Eddy sorts multiple translations (or, preferably, components of them) according to relative lexical distinctiveness. 1.2. The paper also outlines conditions for a formula to aggregate results of Eddy analyses, and to derive a mathematical definition of the extent to which source texts (or, preferably, components of them) are associated with variation in translations. This formula is given the name ‘Viv’. Viv sorts source texts (or preferably components of them) according to Eddy result ranges and averages. Results may be obtained from analysis of translations in one or more target languages. 1.3. Eddy and Viv have applications wherever multiple translations of the same source text have been produced and need to be compared. Eddy enables large numbers of translations to be sorted, surveyed for commonalities and differences, and ranked according to objective (machine-readable)
1 2

Draft of a paper for publication in: Un/Translatables, edited by Bethany Wiggin (Northwestern UP, 2012). The ‘Version Variation Visualisation’ project funded by Swansea University’s Research Institute for Arts and Humanities (Feb – July 2011). Principal investigator: Tom Cheeesman. Co-investigators: David M. Berry, Robert S. Laramee, Andrew J. Rothwell. Research assistants Zhao Geng and Alison Ehrmann. Consultant software and interface designer: Stephan Thiel. See: www.delightedbeauty.org > Outputs.

1

criteria. Eddy sorts out the ‘least distinctive’ (i.e. ‘most usual’) and the ‘most distinctive’ (i.e. ‘most unusual’), at full text level, or preferably at a more detailed level of text components. Viv sorts out source text components in terms of the different levels of variability among the translations of them. 1.4. The application we intend is a visualisation interface for students of works of ‘world literature’. We are experimenting with a collection of translations of Shakespeare’s play Othello. We plan to build a digital ‘Translation Array’ interface for exploring variation among translations of any multiply translated document. 1.5. Other potential applications of Eddy and Viv include translation industry quality control, and assessment in translation training. 1.6. Shakespeare makes a good experimental case study. Hundreds of translations of Shakespeare’s plays exist, in dozens of languages.3 In many languages, dozens of different translations exist. They have been being produced for some 250 years, and are still being produced. Some are published in books: reading editions, or study editions. Many are produced for use in theatrical or other performances: theatre, film, radio, and television scripts. 1.7. Translations vary greatly, and not randomly. Variations in translations of world cultural heritage texts are of cross-cultural interest for researchers, artists, and others. 1.8. Eddy is metaphorically named after ‘eddy’ = an effect of turbulence in streams. Eddy analysis can be applied to components on various scales within a stream of linear text, to detect variation in distinctiveness along the stream of a translation. Variation in translation results from the complex interaction of multiple dynamic factors, including prevailing linguistic, cultural, poetic and political norms, constraints and expectations, individual translators’ commitments and idiosyncrasies, and problems or opportunities presented by source text features. Eddy makes the consequences of this complexity surveyable. 1.9. Viv is metaphorically named after ‘vivacity’ = liveliness. Viv also stands for: ‘Variation in intensity of variation’. In world culture, translations transfuse life into texts, keep them alive; without translators, Shakespeare for example would not be a global icon. Viv analysis facilitates surveys of the source text stream as a cause of varying variation in translation, without assuming knowledge of translating languages. 1.10. Section 2 introduces Eddy. Section 3 applies Eddy to a sample of text from Othello. Section 3 introduces Viv and a second sample in order to suggest how Viv might be implemented. Section 4 discusses problems and prospects for this type of approach.

3

UNESCO’s Index Translationum online, only covering publications since 1979, currently catalogues 222 translations of Othello in 40 languages. This is certainly an under-estimate. Our systematically collected German corpus, with 55 different translations collected so far, includes many theatre scripts not in public circulation, hence not in the UNESCO database.

2

2. Introducing Eddy 2.1. The Eddy formula is:

ΣD/tf
or more fully:

ΣD/tf(w1,d … wN,d)
2.2. Explanation: Each translation is a ‘document’ (d) with N words in it. The set of the documents, i.e. the corpus of variant translations, contains D documents. Term frequency (tf) is the number of times a word is used in a corpus. Term frequencies are found (using concordance tools)4 for every word in the document: from word 1 to word N (i.e.: w1,d … wN,d). For every word, in each document, tf is then divided into D. Then all the D/tf totals are added together (Σ = sum of), giving an Eddy result for each document. 2.3. An algorithm for applying the Eddy formula to D translations involves these steps:   2.3.1: Establish a corpus by selecting a set of variant, comparable translations in one language. 2.3.2: Put all D translations into one file (i.e. the corpus) and find the term frequencies (tf) for all the words used. The tf figures will range from 1 (for a word used only once, in only one translation) up to D, or beyond (if a certain word is used in some or all translations twice, three times, etc, then its tf may be 2D, 3D, etc). 2.3.3: Divide the term frequency for each word into D. The results range from D (for words used only once in the corpus) down to 1 (for words used once – on average – in every document), or less than 1 (for words used more often than that). 2.3.4: For every separate document, add up the D/tf values for all the words in the document.

2.4. The Eddy formula has been arrived at through experimentation.5 It is an adaptation of ‘tf-idf’ analysis, as used in information retrieval and algorithmic criticism.6 2.5. Eddy sorts translations’ lexis statistically. Relative distinctiveness equates to deviation from a notional ‘mean’ or average translation: a translation which only uses words which are used by many or all other translations. This translation may correspond to an empirically real document, but it

4 5

I used Mike Scott’s ‘WordSmith’ (www.lexically.net). At the ‘Un/Translatables’ conference in April 2011, I presented an analysis based on Sample A (see below), using manual counts of variant translations of selected terms and variant syntactical forms. This ad hoc analysis was neither fully replicable nor generalizable to other samples, unlike Eddy and Viv. 6 Stephen Ramsay, “Algorithmic Criticism”, Blackwell Companion to Digital Literary Studies, eds S.Schreibman and R.Siemens, Blackwell, 2007: 477–91, at http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405148641/9781405148641.xml&c hunk.id=ss1-6-7; see also ‘Tf-idf weighting’ in Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008, at: http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html

3

need not. A low Eddy score denotes a translation which is close to the implicit consensus among most translators. A high Eddy score denotes an idiosyncratic translation. 2.6. Eddy is a synchronic analysis. Applied to historical material, it does not tell us how translations derive or depart from one another in chronological sequence. It only tells how they differ from one another in transhistorical, cultural space. It is best suited for giving us an overview of a large corpus of translations, helping us navigate amongst them. 2.7. If Eddy is applied to whole translation texts, it gives an overview of the general lexical properties of each in relation to others. Lexical properties imply stylistic and semantic properties. Lexical analysis is a working proxy for stylistic and semantic analysis. 2.8. If Eddy is applied to components of translation texts, it is a more powerful analytic tool. It then enables us to survey how individual translations vary in relative distinctiveness along their linear stream: where a translator’s work is like that of other translators, and where it is more distinctive. 2.9. We envisage applying Eddy at the most detailed practicable component level. ‘Practicable’ refers to the work of aligning components of source and translation texts in parallel: i.e. identifying ‘equivalent’ or ‘corresponding’ text components. Alignment can be machine-assisted but demands manual work. Alignments can be problematic, especially at more detailed text levels. Two analysts may justifiably disagree about (a) what constitutes a source text component and (b) what component of a translation text ‘corresponds’ to a source text component. 2.11. Alignment generally becomes more problematic, the more detailed the level of analysis. It is normally easiest at full text level; quite easy at the level of larger structural components, such as chapters, paragraphs, or (in plays) scenes or speeches; often more difficult at the levels of sentences and words. Translations frequently form sentences by splitting and/or combining source text sentences. Translations frequently use words which split or combine the semantic values of source text words. 2.12. What level(s) of alignment is (are) practicable is a matter of empirical experimentation. Levels may vary along the streams of the source and translation texts. There is no need to insist on a specific level of analysis, nor to maintain the same level of analysis at every point in the stream. Plural and overlapping alignments, even contrary alignments by different analysts, will only enrich the results.

4

3. Eddy visualized: Sample A 3.1. Multiple translations of two lines from Othello have been collected in numerous versions in many languages, including 35 in German. This is Sample A. The lines are: If virtue no delighted beauty lack, Your son-in-law is far more fair than black. These lines can be interpreted in many different ways and are translated in very various ways. 3.2. Figure 2 shows three German translations of Sample A. It shows D/tf scores for individual words, represented (a) in figures and (b) by assigning larger font sizes to higher-scoring (= more distinctive, unusual) words.

Eddy result (rounded) – Translator, date Translation text
Words replaced by D/tf scores (35/tf).

Font sizes represent D/tf scores: larger = more distinctive/unusual. Back-translation with distinctive terms in bold 195 – Wolff, 1920 Leiht Tugend ihre Farbe dem Gesicht, / Ist Euer Eidam weiß, ein Schwarzer nicht.
35, 1.4, 35, 35, 17.5, 35, 1.3, 1.7, 4.4, 7, 11.7, 8.8, 2.1

Leiht

Tugend

ihre Farbe dem Gesicht,

/ Ist Euer

Eidam weiß, Schwarzer nicht. If virtue lends its colour to the face / your son-in-law is white, not a black [man]. 240 – Engel, 1939 Spricht man von Tugend, als von einem Licht, / Scheint Euer Eidam mir so dunkel nicht. 35, 11.7, 17.5, 1.4, 1.6, 17.5, 35, 17.5, 35, 1.7, 4.4, 4.4, 17.5, 2.1

ein

Spricht man von von einem Licht, Scheint mir so dunkel
Tugend als Euer Eidam nicht.

/

If one speaks of virtue as of a light, / your son-in-law seems not so dark to me. 180 – Schwarz, 1941 Wenn nie der Tugend lichte Schönheit fehlt, / ist Eure Tochter hell, nicht schwarz, vermählt.
1.6, 35, 2.5, 1.4, 17.5, 2.2, 5, 1.3, 35, 35, 5.8, 2.1, 1.7, 35
Wenn

nicht schwarz

nie lichte vermählt.
der Tugend

Schönheit

fehlt, /

ist

Eure Tochter hell,

If virtue never lacks bright-lit beauty, / your daughter is brightly, not blackly, married. Figure 2: Visualization of Eddy analysis process and output

5

3.2. Figure 2 demonstrates a potential Eddy output, highlighting distinctive terms in translations. Note that these include function words (‘stop words’), omitted in some computational analyses or overlooked, but crucial in stylistic analyses. 3.3. Figure 3 shows 35 German translations of Sample A in chronological order, illustrating the complex patterning of similarities and differences. These versions can be found at www.delightedbeauty.org (German page) with back-translations and metadata.

wenn Tugend die glänzendeste Schönheit ist, so ist euer Tochtermann mehr weiß als schwarz. wenn es der Tugend nicht an Reiz und Schönheit fehlt, so ist Ihr Schwiegersohn vielmehr weiß, als schwarz. Wenn je die Tugend einen Mann verklärt, / Ist Euer Eidam schön und liebenswert. Wenn’s nur der Tugend nicht an Schönheit fehlt, / Werd’ Euer Sohn den Weißen beigezählt. Wenn’s nur der Tugend nicht an Weisheit fehlt, / Werd’ Euer Sohn den Weißen beigezählt. Wenn’s nur der Tugend nicht an Reinheit fehlt, / Werd’ Euer Sohn den Reinen beigezählt. Wenn es der Tugend nicht an lichter Schönheit fehlt, / ist vielmehr blond als schwarz, den euer Kind gewählt. Wenn man die Tugend muß als schön erkennen, / Dürft Ihr nicht häßlich Euren Eidam nennen. Mehr schön als schwarz ist euer Tochtermann, / Wenn Mannheit reizen und gefallen kann. Wenn Tugend Reiz und Schönheit nicht entbehrt, / Ist Euer Eidam schön und liebenswerth. Eu’r Eidam, – wenn die Tugend lieblich macht, – / Gleicht mehr dem hellen Tag als schwarzer Nacht. Wenn Tugend ist mit Schönheitsreiz vereint, / Eur Schwiegersohn nicht schwarz, nein, schön erscheint. Entbehrt die Tugend Reiz und Schönheit nicht, / Ist euer Eidam minder schwarz als licht. Leiht Tugend ihre Farbe dem Gesicht, / Ist Euer Eidam weiß, ein Schwarzer nicht. Spricht man von Tugend, als von einem Licht, / Scheint Euer Eidam mir so dunkel nicht. Wenn nie der Tugend lichte Schönheit fehlt, / ist Eure Tochter hell, nicht schwarz, vermählt. wenn Mannesmut nicht Reiz und Glanz entbehrt, / so ist er, wenn auch schwarz, höchst schätzenswert. Wo so viel Mut bei so viel Eifer wohnt, / Dünkt Euer Eidam minder schwarz denn blond. Zählte bei Menschen nur der innre Schein, / würden wir dunkler als Othello sein. Wenn edler Sinn für Schönheit gelten kann, / Ist Euer Schwiegersohn ein schöner Mann. Wenn Tugend sich mit Schönheit messen kann, / Mehr schön als schwarz ist Euer Tochtermann. Wenn Ihr der Tugend nicht Schönheit absprechen wollt, / Ist Euer Schwiegersohn nicht dunkel, sondern Gold! Gilt Tugend als der Schönheit höchste Kron, / Mehr schön als schwarz ist Euer Schwiegersohn. ist Tugend selber höchste Schönheit schon, / so ist mehr schön als schwarz dein Schwiegersohn. Wenn es der Tapferkeit nicht an froher Schönheit mangelt, ist Euer Schwiegersohn eher weiß als schwarz. wenn der Tugend nicht die lichte Schönheit fehlt, dann ist Euer Schwiegersohn viel eher hell als schwarz. Wenn zur Tugend die Freude an der Schönheit gehört, dann ist Euer Schwiegersohn eher schön [hell] als schwarz. Wenn Tugend schön ist, hast du jetzt zum Lohn / Nen schwarzen, aber schönen Schwiegersohn. wenn Tapferkeit und Tugend, schön und hell, zusammengehn; / ist Euer Schwiegersohn mehr schön und hell als schwarz zu sehn. Wär äußrer Schein stets innrer Werte Preis, / schien mancher Weiße schwarz, manch Schwarzer weiß. Gäbs helle Haut für Edelmut als Preis, / Dann wär Ihr Schwiegersohn statt schwarz reinweiß. Wenn wir uns an der Tugend freun, der Schönheit Harz, / Dann ist Ihr Schwiegersohn mehr schön als schwarz. Solange männliche Tugend mehr zählt als Schönheitsfehler, kann man sagen, Ihr Schwiegersohn ist eher edel als schwarz. Kühnheit wirkt anziehnd, hell erstrahlt zum Lohn / Mehr schön als schwarz drum Euer Schwiegersohn. wenn Tapferkeit allein / so schön sein kann, / dann ist Ihr schwarzer Schwiegersohn / ein weißer Mann.

Figure 3: 35 German translations of Sample A – chronological sequence 3.4. Figure 4 ranks the 35 translations of Sample A according to Eddy results. Figure 5 substitutes English back-translations.

35 German translations of “If virtue no delighted beauty lack, / Your son-in-law is far more fair than black”, arrayed in order of distinctiveness Eddy results (rounded: 80 – 335) measure distinctiveness Author Date Text-type: S = Study edition, R = Reading edition, T = Theatre script
80 Engler 1977 S: wenn der Tugend nicht die lichte Schönheit fehlt, dann ist Euer Schwiegersohn viel eher hell als schwarz. 80 Wieland 1766 S: wenn Tugend die glänzendeste Schönheit ist, so ist euer Tochtermann mehr weiß als schwarz. 80 Gundolf 1909 R: Entbehrt die Tugend Reiz und Schönheit nicht, / Ist euer Eidam minder schwarz als licht. 80 Bodenstedt 1867 R: Wenn Tugend Reiz und Schönheit nicht entbehrt, / Ist Euer Eidam schön und liebenswerth. 95 Eschenburg 1779 S: wenn es der Tugend nicht an Reiz und Schönheit fehlt, so ist Ihr Schwiegersohn vielmehr weiß, als schwarz. 110 Lauterbach 1973 T: Gilt Tugend als der Schönheit höchste Kron, / Mehr schön als schwarz ist Euer Schwiegersohn. 125 Schaller 1959 R: Wenn Tugend sich mit Schönheit messen kann, / Mehr schön als schwarz ist Euer Tochtermann. 130 Bolte/Hamblock 1976 S: Wenn es der Tapferkeit nicht an froher Schönheit mangelt, ist Euer Schwiegersohn eher weiß als

6

schwarz. 135 Schiller (Voss) 1805 R: Wenn je die Tugend einen Mann verklärt, / Ist Euer Eidam schön und liebenswert. 135 Voss 1805 Draft 1: Wenn’s nur der Tugend nicht an Schönheit fehlt, / Werd’ Euer Sohn den Weißen beigezählt. 140 Voss 1805 Draft 2: Wenn’s nur der Tugend nicht an Weisheit fehlt, / Werd’ Euer Sohn den Weißen beigezählt. 140 Jordan 1868 R: Mehr schön als schwarz ist euer Tochtermann, / Wenn Mannheit reizen und gefallen kann. 145 Swaczynna 1972 T: ist Tugend selber höchste Schönheit schon, / so ist mehr schön als schwarz dein Schwiegersohn. 150 Voss 1805 Draft 3: Wenn’s nur der Tugend nicht an Reinheit fehlt, / Werd’ Euer Sohn den Reinen beigezählt. 155 Klose 1971 S: Wenn zur Tugend die Freude an der Schönheit gehört, dann ist Euer Schwiegersohn eher schön [hell] als schwarz. 160 Rüdiger 1983 T: wenn Tapferkeit und Tugend, schön und hell, zusammengehn; / ist Euer Schwiegersohn mehr schön und hell als schwarz zu sehn. 160 Buhss 2006 T: Wenn wir uns an der Tugend freun, der Schönheit Harz, / Dann ist Ihr Schwiegersohn mehr schön als schwarz. 160 Karbus 2006 T: wenn Tapferkeit allein / so schön sein kann, / dann ist Ihr schwarzer Schwiegersohn / ein weißer Mann. 180 Fried 1970 R: Wenn Ihr der Tugend nicht Schönheit absprechen wollt, / Ist Euer Schwiegersohn nicht dunkel, sondern Gold! 180 Schwarz 1941 T: Wenn nie der Tugend lichte Schönheit fehlt, / ist Eure Tochter hell, nicht schwarz, vermählt. 185 Benda 1826 R: Wenn es der Tugend nicht an lichter Schönheit fehlt, / ist vielmehr blond als schwarz, den euer Kind gewählt. 190 Koch 1885 R: Wenn Tugend ist mit Schönheitsreiz vereint, / Eur Schwiegersohn nicht schwarz, nein, schön erscheint. 195 Wolff 1920 R: Leiht Tugend ihre Farbe dem Gesicht, / Ist Euer Eidam weiß, ein Schwarzer nicht. 195 Flatter 1962 R: Wenn edler Sinn für Schönheit gelten kann, / Ist Euer Schwiegersohn ein schöner Mann. 230 Wachsmann 2005 T: Kühnheit wirkt anziehnd, hell erstrahlt zum Lohn / Mehr schön als schwarz drum Euer Schwiegersohn. 240 Engel 1939 T: Spricht man von Tugend, als von einem Licht, / Scheint Euer Eidam mir so dunkel nicht. 245 Baudissin 1832 R: Wenn man die Tugend muß als schön erkennen, / Dürft Ihr nicht häßlich Euren Eidam nennen. 245 Zeynek 1945 T: wenn Mannesmut nicht Reiz und Glanz entbehrt, / so ist er, wenn auch schwarz, höchst schätzenswert. 255 Zaimoglu/Senkel 2003 T: Solange männliche Tugend mehr zählt als Schönheitsfehler, kann man sagen, Ihr Schwiegersohn ist eher edel als schwarz. 270 Gildemeister 1871 R: Eu’r Eidam, – wenn die Tugend lieblich macht, – / Gleicht mehr dem hellen Tag als schwarzer Nacht. 280 Günther 1995 R: Gäbs helle Haut für Edelmut als Preis, / Dann wär Ihr Schwiegersohn statt schwarz reinweiß. 290 Laube 1979 T: Wenn Tugend schön ist, hast du jetzt zum Lohn / Nen schwarzen, aber schönen Schwiegersohn. 290 Rothe 1956 R: Zählte bei Menschen nur der innre Schein, / würden wir dunkler als Othello sein. 305 Schröder 1962 R: Wo so viel Mut bei so viel Eifer wohnt, / Dünkt Euer Eidam minder schwarz denn blond. 335 Motschach 1992 T: Wär äußrer Schein stets innrer Werte Preis, / schien mancher Weiße schwarz, manch Schwarzer weiß.

Figure 4. 3.5. For those who do not read German, Figure 5 shows how Eddy sorts the least and most distinctive translations. Eddy analysis could in principle be applied to back-translations. That would produce different but still valid results, provided the translations were generated according to consistent machine and/or human rules.

35 German translations of “If virtue no delighted beauty lack, / Your son-in-law is far more fair than black”, backtranslated, arrayed in order of distinctiveness of the German versions Eddy results (rounded: 80 – 335) measure distinctiveness Author Date Text-type: S = Study edition, R = Reading edition, T = Theatre script
80 Engler 1977 S: If virtue not lack bright-lit beauty, then your son-in-law is much more bright than black. 80 Wieland S 1766: If virtue is the most-bright-shining beauty, then your daughter’s husband is more white than black. 80 Gundolf 1909 R: If virtue not lack charm and beauty / your son-in-law is less black than bright-lit. 80 Bodenstedt 1867 R: If virtue does not lack charm and beauty, / your son-in-law is beautiful and lovable. 95 Eschenburg S 1779: if virtue does not lack charm and beauty, then your son-in-law is rather white than black. … 240 Engel 1939 T: If one speaks of virtue as of a light, / your son-in-law seems not so dark to me. 245 Baudissin 1832 R: If one must recognise virtue as beautiful, / you may not call your son-in-law ugly. 245 Zeynek 1945 T: If manly courage is not without charm and radiance/glory / then he is, even if black, highly estimable. 255 Zaimoglu /Senkel 2003 T: So long as male virtue counts more than blemishes [beauty-flaws], one can say your son-in-law is more noble than black. 270 Gildemeister R 1871: Your son-in-law – if virtue makes [people] lovely – / resembles more the bright day than black night. 280 Günther 1995 R: If light skin were a prize for noble-mindedness, / then your son-in-law would be pure white instead of black. 290 Laube 1979 T: If virtue is beautiful, you now have as your reward / a black but beautiful son-in-law. 290 Rothe R 1956: If people’s inward appearance alone [were all that] counted, / we would be darker than Othello. 305 Schröder R 1962: Where so much courage resides with so much zeal, / your son-in-law appears less black than blond. 335 Motschach 1992 T: If outward appearance were always the prize for [or: price of] inner values / many a white man would appear black, many a black man white.

Figure 5. 3.6. Eddy results can be depicted as in Figure 1. In this mockup of a browser interface, different print colours in two translations of a sentence indicate how unusual they are among all the translations. 7

3.7 Eddy results can also be depicted as in Figure 6. This gives an overview of the corpus of translations on a historical timeline. Figure 6 shows relevant metadata (types of translations and translator names). It is a detail of Figure 7, which gives a larger (but less complete) historical overview, and omits metadata. In both figures, higher distinctiveness = higher placement.

Figure 6: Sample A – historical view of Eddy range and variability

Figure 7: Sample A – historical view of Eddy range and variability 3.8. Eddy results can also be visualized as a layer of annotation on a representation of a full text, or texts, as in Figure 8. 8

4. Introducing Viv and Sample B 4.1. Given many variant, comparable, aligned translations, and many Eddy results, we can aggregate these results using several variables: averages (mean, mode), overall range, the lowest and highest results, and standard deviations. Finding a practically applicable formula is the object of forthcoming work: it requires a process of experimentation with multiple samples. 4.2. The aim of defining Viv is to enable us (including readers who may not know the translating languages) to survey which components of a source text are associated with most and least difference in translations. 4.3. With Eddy outputs from multiple translations in several target languages, we can investigate to what extent the variability is generated by the source text, and to what extent by the translating cultures. 4.4. A provisional basic formula for Viv is:

Eddy/SN
SN is the number of words in the source text component; it is divided into the mean ( results. )of Eddy

4.5. Number of words in the source text component is a variable which can be assumed to correlate positively, as a rule, with variation among translations. Each additional word is an opportunity for different translators to reach a different decision. 4.6. By this basic formula, Viv for Sample A in the 35 German translations is 13: mean Eddy is 182, divided by the 14 words in Shakespeare’s lines. We know that Sample A presents unusually difficult problems and multiple opportunities for translators, so 13 is probably a high Viv figure. As stated, Viv will certainly also need to be weighted for factors including the range of Eddy results (80-335) and the lowest result found (80). 4.6. Sample B was selected to contrast with Sample A. Sample B presents far fewer problems and opportunities for translators to vary (at least, this is the case for translation into German; it might be different in another language). We therefore expect to find lower Eddy results. Sample B is the first sentence of Othello’s ‘life-story’ speech to the Senate: Her father loved me; oft invited me; Still question'd me the story of my life, From year to year, the battles, sieges, fortunes, That I have passed. (1.3.128-131) The corpus of variant translations differs slightly: some translators give a unique version of one sample, but not of the other. We have 32 differing translations of sample B. 4.7. For the 32 translations of sample 2 (27 words) the mean Eddy is 144, so Viv is 5.3 on the basic formula. The range is from 16 to 310. The lowest figure is far lower than the 80 for Sample A. This will no doubt be significant for weighting Viv.

9

4.8. Generating Eddy results is time-consuming without a dedicated software application. Until more work is done we cannot really visualize Viv, only ‘mock up’ how a Viv layer might appear, as in Figure 1 or Stephan Thiel’s more elegant Figure 9.

Figure 9

10

5. Considerations 5.1. Figures 10 and 11 chart the Eddys for Samples A and B in the 32 translations: chronologically in Figure 10, and taking Sample B as the norm in Figure 11. There is little correlation between the variation. But where there is correlation, it is significant for interpretation. Baudissin (1832) (i.e. the Schlegel-Tieck edition, which is still ‘canonical’ today), Rothe (1956), and Motschach (1992) all appear – within their periods – as spikes of interest on both lines. Buhss (2006) is a spike of interest for Sample B, not for Sample A. The study editions of Wieland (1766) and Engler (1977) stand out in Figure 11. These are all, in fact, translations of special critical interest, for various reasons. But the overall lack of correlation is most important: the relative distinctiveness of any translation tends to vary along the stream of its text. No text sample can represent the text. We will have to read all of all the texts – or at least ‘read’ algorithmically, with Eddy’s help.

Figure 10: Samples A and B; Eddy results; chronological sequence

Figure 11: Samples A and B; Eddy results; rising for Sample B

11

Many methodological as well as technical issues face builders of a re-translation array as a functioning interactive device, online: not a ‘resource’, but a site of collaborative activity, from research to creative play. What we envisage is an interface enabling casual as well as expert users to view and explore text data and metadata, both directly (close reading) and by viewing and creating visualized analytic outputs. Issues include source and target text normalisation, tagging, alignment, variation identity, interactivity, back-translation, and cross-cultural extension. Source text normalisation means establishing a ‘translatum’ – a pragmatic artefact. Othello’s text is relatively stable by Shakespeare standards, but most translators have translated different representations of it (their choices are frequently influenced by editors’ glosses). Our ‘base text’ is a collation of an open source text against a recent edition (Neill). Target text normalisation for purposes of computational analysis presents language- and corpusspecific problems. The tf–idf analysis was carried out on un-normalised texts.7 Divergent spellings, contractions, and inflections were left to stand. Laube’s (1978) couplet translation scores very highly (tf–idf 288.8) because the three words “schönen aber schwarzen” all count as unique, although “schön” and “schwarz” are very common, without the inflections. My preferred solution is to run the analysis on both stemmed (inflection-stripped) and unstemmed corpora: many tf–idf sums would become ranges, rather than single values. The slight blurring of results would also offset the appearance of humanistically inappropriate scientific rigour. Our corpus is untagged: not marked-up. Tagging facilitates other searches and analyses. Lemmatisation (mapping words to dictionary head-words) and Part of Speech analysis (POS: mapping words to linguistic categories) are options. For the purposes described here, where morphological differences are very important, lemmatisation offers little. But POS analysis would address the problem that when Schwarz (1941; tf–idf 181) uses the uninflected words “schön” and “schwarz” adverbially, her unique syntax cannot be read and ‘rewarded’ by the algorithm. Tagging demands major time investment, as computational approaches are unreliable. It is not clear whether the investment would be justified. But the work could be crowd-sourced. Alignment of full texts is supported by translation technology software, but demands careful manual checking. Problems include omission or ‘null’ values (do we ‘score’ a translation or adaptation for omitting a source element?), transposition (what if an adaptation assigned sample A to Iago?), and addition. Zaimoglu and Senkel’s translation of sample B is largely free adaptation, adding a great deal to the source. I aligned just the first eight of their words to the source text sample: only these eight words in their adaptation are unarguably derived from the sentence. The result was the lowest tf–idf score of any translation. But their version of the speech derives at many moments directly from the source, so we could align at those moments and include intervening words. That would include 81 words in their sample B, giving a tf–idf score of over 1,000 and skewing the sums. Variation identity is a sub-problem of alignment: repetition between translations, and its limits. The corpus of significantly different translations differs between samples, as with the two just discussed, and ‘significantly different’ must be defined in a computationally readable way. There are countless
7

Digital transcriptions were corrected by the hands and eyes of Alison Ehrmann and myself against facsimiles, after scanning and OCR using ABBYY’s Recognition Server, which uniquely handles Fraktur fonts – many thanks to Colin Miller at ABBYY UK for the free trial!

12

editions of Baudissin’s text (i.e. the canonical Schlegel-Tieck edition), with variants: it has been curated almost as often as Shakespeare’s text itself. And many other translators re-use his work, on varying scales. Sometimes the spelling of words varies: his “Euren” in the Duke’s couplet can appear as “Euern” or “Euer’n”. It is easy to define these differences as insignificant for analytic purposes, in an artisanal analysis – not so easy for a computer. 7. Building translation arrays: problems as opportunities at interactive digital sites Problems of normalisation, tagging, and alignment are opportunities for a translation array, envisaged as an interactive digital site. It will use ‘standoff’ markup – i.e. annotations and other metadata are held in a database separately from the ‘raw’ text documents. This is an approach developed in recent digital tools for collation and editioning, and supporting hermeneutic analysis (CATMA, Green, Jones, Meister, Piez, Schmidt). It means builders and users of a translation array can adopt and adapt one another’s work to specific purposes, and incrementally enrich the resource. The builders will provide initial editorial annotations and metadata including gross or skeletal alignments at scene and speech levels, and at sentence level in selected speeches. Users will be able to extend and enrich the metadata. Back-translation is a sine qua non of a translation array, conceived as a contribution to global public conversations: another problem which is an opportunity for interactivity. Users who know no German should find a VIF reading interesting; but having seen that source elements provoke varying variation, they should want to see the variation. Machine translation is one part of the solution; another is the crowd of users. User-generated content includes back-translations among other annotations, data and metadata. A translation array should be as global as the source text’s travels – and infinitely extensible in both range of data and richness of metadata. The current limitation of our corpus to German sources is pragmatic. At our website www.delightedbeauty.org, the Duke’s couplet is represented in a growing variety of languages, with hundreds of found and ad hoc translations, metadata, back-translations, commentaries, mini-essays, and discussions: all contributed voluntarily, indicating the scale of worldwide interest in such a collaborative project. Translations are repositories of human knowledge – knowledge of language, which is a proxy for knowledge of the world, put there by translators, both deliberately and unwittingly (Resnik). Translation arrays can help us explore the knowledge held in translations of cultural heritage texts, enabling us read a lot, better – and cross-culturally. Making textual translation visible in new ways will hopefully assist cultural translation to promote cross-cultural understanding. Digital methods and models have as yet scarcely tapped their potential to “foster multilingualism and multiculturalism” (ADHO). Distant, algorithmic, and close readings combine in translation arrays to raise the profile of translation as creative activity, and encourage exploration in translations by many kinds of users: from school classes and translation trainees to scriptwriters and cultural tourists, as well as humanities researchers.

13

APPENDIX

KEY: Engel 1939 – Baudissin 1832 (Project Gutenberg text) – Common Ihr Vater liebte mich, lud oft mich ein, Bat mich, Erforschte meines Lebens Lauf von meinem Leben zu erzählen, Von Jahr Zu Jahr: die Schlachten, Stürmen, Glück Stürme, Schicksalswechsel, So ich bestand. Ich ging es durch, vom Knabenalter her Bis auf den Augenblick, wo er gefragt. So sprach ich denn von manchem harten Fall, Von schreckender Gefahr zu See und Mißgeschick, Die mir begegnet.

Figure 4: Engel (1939) editing Baudissin (1832) Figure 5 compares the German and French repertoires of choices for “fair” and “black” in the Duke’s couplet. Computation cannot – at present – reliably identify which words (if any) in an array of translations translate specific words in the source: only close reading can.8 Such close reading amounts to micro-alignment, and that could be performed more easily, transparently, and replicably, in an interactive graphical interface than the way I did it, with over 50 texts on paper and/or on screen in various formats. This analysis addresses socio-cultural distinctions among translators. In both languages, they are separated into ‘authors’ (i.e. writers with an established reputation, a ‘name’ in institutionalised cultural memory, or currently celebrated) and ‘others’ (those without a ‘name’: the obscure).9 This division will never be watertight, but can be put on a systematic basis using electronic metadata (online search results), and it proves analytically useful. BLACK German ‘authors’ (11) 5 4 2 1 German ‘others’ (24) 2 19 1 French ‘authors’ (11) 1 10 French ‘others’ (11) 1 10

null black: schwarz / noir dark: dunkel ugly: häßlich FAIR null beautiful: schön / beau white: weiß / blanc bright: hell / brillant likeable/lovable: liebenswert /
8

2 1 4 1 1

1 13 4 2 1

2 4

3 2 3 2

1 2

Othello claims that Desdemona “wished / That heaven had made her such a man” (1.3.163) – a famous crux. In most German translations she wished to acquire a man. Only the Desdemonas of Wieland (1766), Benda (1826), Schwarz (1941 – the only woman translator in this corpus), and Zaimoglu/Senkel (2003) wish that she had been born a man. POS analysis (see below) might pinpoint such differences. 9 German ‘authors’ are Baudissin (mostly ‘known’ as ‘Schlegel-Tieck’), Fried, Günther, Gundolf (who collaborated on his translations with Stefan George), Rothe, Schiller, Voss (3 drafts), Wieland, and Zaimoglu/Senkel. French ‘authors’ are Aicard, Bonnefoy, Déprats (currently celebrated), Hugo (2), LeTourneur (2), Robin, de Vigny (2). For all names see www.delightedbeauty.org. My thanks to Matthias Zach for providing most of the French repertoire.

14

agréable, plaisant blond other

4

2 1

2

1

Figure 5: 35 German and 22 French ‘author-translators’ and ‘other translators’: choices for “black” and “fair” in “If virtue…”

15

REFERENCES ADHO (Alliance of Digital Humanities Organizations) 2011 “Digital Humanities 2012: Call for Papers” <http://www.digitalhumanities.org/dh2012cfp>. Altintas, K et al., 2007: ‘Language Change Quantification Using Time-separated Parallel Translations’, Literary and Linguistic Computing, 22/4, 375-393. Blinn, H. and Schmidt, W.G., 2003: Shakespeare - deutsch: Bibliographie der Übersetzungen und Bearbeitungen, Erich Schmidt. Castagnoli, Sara 2009. Castagnoli S. (2009). "A New Approach to the Analysis of Explicitation in Translation: Multiple (Learner) Translation Corpora".International Journal of Translation 21(1). 89-105. ------- 2010 "Variation and regularities in translation: Insights from multiple translation corpora". Presented at UCCTS 2010 -Using Corpora in Contrastive Linguistics and Translation Studies", Ormskirk (UK), 27-29 July 2010. CATMA (University of Hamburg) (Jan Christoph Meister), <www.catma.de>. Chaudhuri, S. and Chee Seng Lim, eds, 2006: Shakespeare without English: The Reception of Shakespeare in Non-Anglophone Countries (Dorling Kindersley India). Cheesman, T., 2010: “Shakespeare and Othello in Filthy Hell: Zaimoglu and Senkel’s Politico-Religious Tradaptation” Forum for Modern Language Studies 46/2: 207-20. ------- 2012a “Thirty Times More Fair Than Black” Angermion 2012 (forthcoming). ------- 2012b “Mutations of a Difficult Couplet” Cambridge World Shakespeare Encyclopaedia / Online, eds B. Smith and K.Rowe, vol.2 (forthcoming). Correll, M., N. Witmore, M. Gleicher, 2011: "Exploring Collections of Tagged Text for Literary Scholarship", Computer Graphics Forum 30/3: 731-740 Crane, G et al 2009 “Classics in the Million Book Library” Digital Humanities Quarterly 3/1. Delabastita, D. ,1993. There's a Double Tongue. An investigation into the translation of Shakespeare's wordplay, with special reference to Hamlet. Rodopi. --- 2009 “Shakespeare” Routledge Encyclopedia of Translation Studies, 2nd edn Dessen, A 2002: Rescripting Shakespeare: the text, the director, and modern productions, CUP Fry, B 2008 Visualizing Data O’Reilly. ------- 2009 “On the Origin of Species” <http://benfry.com/traces> Galey, A. and R. Siemens, eds 2008 “Reinventing Shakespeare in the Digital Humanities”, Shakespeare 4/3. Geng, Z. and the VVV Team (T. Cheesman, D. M. Berry, A. Ehrmann, Z. Geng, R. S. Laramee, A. J. Rothwell), 2011a: "Visualizing Translation Variation of Othello: A Survey of Text Visualization and Analysis Tools" (paper in progress), at http://cs.swan.ac.uk/~cszg/text/textSurvey.pdf. ------, 2011b, ‘Visualizing Translation Variation: Shakespeare’s Othello’, G. Bebis et al. (eds.): Advances in Visual Computing: 7th International Symposium, ISVC 2011, Part I, LNCS 6938, pp. 657–667 Green, Z. 2008: Restriction and Adaptation in the Comparison of Digitally Represented Manuscript Traditions, MA Diss, Birmingham (UK). Gürcaglar, S.T 2009 “Retranslation” Encyclopedia of Translation Studies, 2nd edn. Hanna, S.F 2005 “Othello in Egypt: Translation and the (Un)making of National Identity”, Translation and the Construction of Identity, ed M R M Ruano, St.Jerome: 109-128. Hope, Jonathan, and Michael Witmore. "The Very Large Textual Object: A Prosthetic Reading of Shakespeare". Early Modern Literary Studies 9.3 / Special Issue 12 (January, 2004): 6.1-36 <http://purl.oclc.org/emls/09-3/hopewhit.htm>. ------- 2010 ““The Hundredth Psalm to the Tune of ‘Green Sleeves’”: Digital Approaches to Shakespeare’s Language of Genre”, Shakespeare Quarterly 61/3: 357-90. Johansson, Stig, 2011: ‘Between Scylla and Charybdis: On individual variation in translation’, Languages in Contrast 11:1, pp. 3–19. 16

Jones, S et al 2010 “E-Carrel: An Environment for Collaborative Textual Scholarship”, J Chicago Colloq DH and Comp Sci 1/2. Kenny, D 2009 “Corpora”, Routledge Enc TS (22009). Kruger, A et al eds 2011, Corpus-Based Translation Studies: Research and Applications Continuum. Kujamaki, P, 2002: ‘Finnish Comet in German Skies: Translation, retranslation and norms’, Target 13/1: 45-70. Louwerse, Henriëtte, Homeless Entertainment: On Hafid Bouazza's Literary Writing. Peter Lang 2007. Mathijssen, J.W 2007 “The Breach and the Observance: Theatre retranslation as a strategy of artistic differentiation” PhD Utrecht. Meister, Jan-Christoph, “The Eternal Sunshine of the Dissenting Voice: Acknowledging Contingency in Digital Humanities”, in: Collaborative Research in the Digital Humanities, eds M Deegan and W McCarty. Ashgate, 2012 (forthcoming). Michel, J-B, et al. 2011: ‘Quantitative Analysis of Culture Using Millions of Digitized Books’. Science, Vol. 331, no. 6014, pp. 176-182 Monroy, C. et al 2002 “Visualization of Variants in Textual Collations to Analyze the Evolution of Literary Works in the Cervantes Project” Lecture Notes in Computer Science 2458: 199-211. Moretti, F 2005 Graphs, Maps, Trees Verso. Neill, M. (ed.) 2006. The Oxford Shakespeare: Othello, OUP. O'Driscoll, K 2011 Retranslation through the Centuries: Jules Verne in English Lang. Piez, W. 2010, ’Towards Hermeneutic Markup: An architectural outline’ at ’Digital Humanities 2010’: http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/pdf/ab-743.pdf. Pujante, A.L. and Ton Hoenselaars, eds, 2003: Four Hundred Years of Shakespeare in Europe, Newark: U Delaware P / London: Associated U Presses Ramsay, S 2007 “Algorithmic Criticism” Blackwell Companion to Digital Literary Studies, eds S.Schreibman and R.Siemens, Blackwell, 477–91. Resnik, P. 2001: Review of Véronis, Computational Linguistics 27/4: 592-5. Rumbold, Kate, 2010: “From ‘Access’ to ‘Creativity’: Shakespeare Institutions, New Media, and the Language of Cultural Value”, Shakespeare Quarterly 61/3: 313-336 Rutter, C C, “Watching Ourselves Watching Shakespeare – Or – How Am I Supposed to Look?,” in Shakespeare Bulletin 25:4 (2007): 47–68. Saldanha, Gabriela, “Principles of Corpus Linguistics and their Application to Translation Studies Research” Revista Tradumatica 7 (2009). Schmidt, D 2010 “The Inadequacy of Embedded Markup for Cultural Heritage Texts”, Literary and Linguistic Computing 25/3: 337-356 --- and R.Colomb, 2009: “A Data Structure for Representing Multi-version Texts Online”, International Journal of Human-Computer Studies 67/6: 497-514 Spiro, L., 2008ff “Digital Research Tools” <digitalresearchtools.pbworks.com> Stein, P., 2005 “Die Übersetzungen von Titus Livius' Ab Urbe condita…”, Romance Corpus Linguistics, eds C.D.Pusch et al., Narr: 57-70. Stone, M 2009 “Information Visualization: Challenge for the Humanities” Working Together or Apart: Promoting the Next Generation of Digital Scholarship CLIR/NEH, 43-56. Susam-Sarajeva, Ş, 2003: “Multiple-entry visa to travelling theory: Retranslations of literary and cultural theories”, Target 15/ 1, pp. 1-36 Thiel, S 2009 “Understanding Shakespeare” <www.understanding-shakespeare.com>. Trettien, W.A 2010 “Disciplining Digital Humanities, 2010” Shakespeare Quarterly 61/3: 391-400. Venuti, L. (2004). Retranslations: The creation of value. Translation & Culture (pp. 25-38). Katerine M. Faull (Ed). Lewisburg: Bucknell UP. Véronis, J., ed. 2000: Parallel Text Processing: Alignment and Use of Translation Corpora, Kluwer. Viegas, F. B. et al., "ManyEyes: a Site for Visualization at Internet Scale," IEEE Transactions on Visualization and Computer Graphics, pp. 1121-1128, November/December, 2007.

17