You are on page 1of 2

A Provo-centric History of Machine Translation (MT)

AKM notes 2010 Nov 30

See: Machine Translation: A Brief History by John Hutchins for a more general history. [From: Concise history of the language sciences: from the Sumerians to the cognitivists. Edited by E.F.K.Koerner and R.E.Asher. Oxford: Pergamon Press, 1995. Pages 431-445] 1948: First stored-program electronic digital computer (http://www.computer50.org/) Computers would not have gone very far without the transistor and then the integrated circuit. Harvey Fletcher, a Provo boy, was in charge of the research team at Bell Labs that invented the transistor. First phase of optimism by machine translation developers: from early 1950s to mid 1960s Georgetown + IBM: 1954 demonstration 1966: ALPAC report resulted in a long dry spell in US government funding for MT research 1968: Peter Toma (of the Georgetown MT project) founded Systran (the oldest commercial MT system). About the same time, AKM heard about machine translation research and the IBM "photoscopic" system (precursor to today's CD-ROM and DVD-ROM) for storing and retrieving dictionary entries. About the same time, Daryl Gibb was doing an MA thesis on machine translation under Eldon Lytle, BYU professor of Linguistics, based on Junction Grammar. 1970: Daryl Gibb wrote to Eldon Lytle suggesting he start a machine translation project; Lytle shares letter with AKM and both go after funding. Project begins in September 1970; becomes TSI (Translation Sciences Institute). Bruce Weidner develops competing MT system in Provo. At some point around 1980, Peter Toma tells AKM that if you give him all the rules, he can program perfect MT. Need to resolve ambiguities is addressed by human interaction during analysis (either we didn't know the rules or language is not rule-governed). Project begins working on Liahona materials. Assumption: one interactive English analysis; multiple automatic generations thanks to mapping from words to universal sememes. 1978: AKM realizes that mortal sememes are not independent of language and culture. 1980: TSI is shut down; Eldon Lytle returns to Nevada; most of the team leaves BYU to form ALPS. 1980: ALPS begins work on first translator tool with translation memory; Martin Kay writes essay on proper place of humans and computers in translation. 1980s: Work on what is now called TEnTs in parallel with rule-based MT (Eurotra). 1984: Second period of optimism for MT. Serge Pershke tells AKM to stop wasting time on tools for

human translators, since they will be replaced by MT within five years. Eurotra project is funded by European Union. 1990s: AKM works on philosophy of language. Two dimensions: general-domain; dynamic-frozen. IBM works on statistical machine translation (SMT). 2000-2010: AKM works on translation standards (TMX, TBX, SRX, ASTM F-2575). He revisits philosophy of language in the study of context and realizes that the basic error of rule-based MT was to assume that translations of multi-word lexical items can be predicted by dynamic combinations of translations of individual words using rules. Context article: http://www.trans-int.org/index.php/transint/article/view/87 SMT gradually becomes better. Third period of optimism for MT. 2006 AMTA: opening plenary is Marcu-Melby debate on the future of data-driven MT. See debate slides at: http://www.ttt.org/amta/ 2010: Eldon Lytle passes away in Pioche, Nevada. Reminder that syntax is relevant. 2010 AMTA: Melby reminds Marcu of agreement to debate again in 2012. Three scenarios where MT is useful today: Narrow domain, controlled language (e.g. Canadian weather bulletins) o No post-editing needed Selecting which texts to translate (e.g. government intelligence gathering) o Selection of relevant texts is accurate without post-editing Alternative to zero translation (e.g. tech support postings or instant messaging) o No time or money for post-editing

Issues - Who does post-editing when typical human quality translation is needed? - When is post-edited MT more work than human translation from scratch? - Will MT soon be good enough that it is indistinguishable from human translation without postediting?

You might also like