viaLegal Whitepaper: Machine Translation for Legal

Published by viaLanguage
With business continuing to expand globally and free translation solutions such as Google Translate and Yahoo! Babelfish emerging, interest in machine translation (MT) is growing. Here we will examine some of the pros and cons of machine translation and situations for use.
Published by: viaLanguage on May 17, 2011
Copyright:Attribution Non-commercial


Machine Translation or Legal: Applications, Considerations and Hurdles
With business continuing to expand globally and ree translation solutions such as Google Translateand Yahoo! Babelsh emerging, interest in machine translation (MT) is growing. When MT isimplemented properly it has the potential to increase human translation productivity in applicationssuch as data mining large documents or litigation and e-discovery, reducing costs and increasingproductivity. However, i approached without proper care the results can be rustrating and useless,and is thereore not surprising that many corporations and law rms alike are still puzzled as towhen to use machine translation, i it’s aordable and what level o quality to expect. Here we willexamine some o the pros and cons o machine translation and situations or use.
What is Machine Translation and how does it work?
Machine Translation (MT) is the process by which computer sotware is used to translate text romone natural language to another. In order or any translation, human or machine, to be successul,the meaning o the text in the original source language must be ully restored in the target language.
Although this sounds straightorward, it is actually much more complex as translation is not simplyword-or-word substitution. The machine must interpret and analyze all the eatures o a textincluding grammar, semantics, syntax and culture in order to eectively convey the meaning andintention o the text as a whole.
Machine Translation Methods
Machine Modern machine translation solutions such as those rom Microsot, IBM, Google and Yahoouse three methods to achieve eective comprehension in their output. These three methods aredescribed below:
Rule Based
This was the rst eective strategy developed or MT. Rule based MT relies on a collection orules or grammar, lexicons and subject specic terminology that are applied automatically bythe application in real time. Rule based MT can deliver high quality output in relative terms butcan be prohibitive because o high upront investment costs that make it most suitable or largescale high quality content needs.
Statistical MT generates translations using statistical methods that match commonly usedphrases and translations together, and thereore relies heavily on existing multilingual content.The source content being used to derive the ‘statistical matches’ has a massive eect on theoutput quality o statistical MT. Where enough subject relevant data can be used, the output isoten o good quality and can even fow more naturally in line with human translations. Statisticalsolutions can also be deployed ar quicker and with lower investment costs. However, i thecontent used or training is not subject specic, translations can be essentially useless in somecases. Google Translate and Bing are both examples o statistical MT. Because o their relativelyeasy set up, statistical MT solutions can be cost eectively applied to situations such as datamining or any large body o text where similar pre-existing content can be ound and output valueis primarily or understanding and speed o inormation retrieval.
Hybrid Models
The most recent developments in MT engines try to use the best o both worlds – combining rulesbased and statistical approaches. The ROI o these tools sits as you would expect between the twoother models. Hybrid solutions are most commonly being applied where the end result is expected tobe near human quality and the volume o work justies the initial investment in training and contentidentication. Currently large volume user guides are an example o a demand making good use ohybrid MT engines.Let’s look at an example o how translation can vary across various MT engines:
EnglishSpanishGoogle Translate
What is your name?¿Cuál es tu nombre?
Yahoo! Babelfsh
What is your name?¿Cuál es su nombre?
Bing Translator
What is your name?¿Cómo te llamas?You can see that something as simple as ‘What is your name?’ gets varying results depending on whichversion o MT is used, especially when no training or customization o the content has taken place. Accuratetranslation generally requires contextual understanding, and MT is oten similar to translation done by ahuman without a deep understanding o the target audience. MT solutions without customization lack theknowledge to deal with word choice and subject matter sensitivity. Content such as jokes, slang, idiomsand wordplay are lost on MT entirely. Due to these challenges the assessment o MT output is typicallybased on comprehensibility – is the content understandable based on its source language intent andmeaning? Without more extensive customization and training, MT engines (especially the most commonweb engines) can provide extremely variable output that oten becomes useless at a commercial level.In selecting an engine it is most important to consider what the use and value o the output is. MT o anytype alone is not a match or human translation. However, used in conjunction with human review or oncontent that would otherwise not be cost eective or ull human translation MT is emerging as a criticaltool in a global content strategy and should be considered by anyone with mid to large scale translationneeds.
When MT Fits:Quality expectations and content type
Machine translation needs to be approached strategically through a consideration o the ultimate goalo your end translation. The type, volume and quality o the source translation also have roles in thisdecision. MT oten seems immediately appealing and certainly can add a lot o value, but keep in mindMT is not always an appropriate solution. Outlined here are a ew examples o when MT may or may notbe a good choice:
MT in a legal context is ideal or processing non-critical content including e-discovery, patent research,intranet pages and discussion orums where a cursory understanding and impression o the contentis sucient. MT can also be used or data culling to identiy relevant pieces o inormation in needo more technical levels o translation and attorney review. MT is not a likely candidate or sensitivecontent where errors in translation could result in compliance, regulatory or legal issues. In suchinstances, human translation is likely more appropriate both rom a skills and liability stand point.
