You are on page 1of 22

Machine Translation

Michael Melese (PhD)


michael.melese@aau.edu.et
Outline
n What is MT ?
n Approaches of MT
n MT tools
n Evaluation of MT
n Challenges of MT
n MT resource
What is MT ?
n Translation is the task of translating a sentence x from one
language (the source language) to a sentence y in another
language (the target language).
n Machine translation uses of computer to automate some or all
of the process of translating from one language to another.
¨ Currently, there is no fully-automatic MT system has been developed,
which can translate any type of input correctly.
¨ MT system does not have to be perfect to be useful.
Approaches of MT
n The commonly used approaches to MT;
¨ Rule-based translation
¨ Statistical
translation
¨ Example-based translation
¨ Hybrid approaches and
¨ Neural based translation
Rule based MT (1)
n Rule-based machine translation relies on countless built-in
linguistic rules and millions of bilingual dictionaries for each
language pair.
¨ Translations are built on gigantic dictionaries and sophisticated
linguistic rules.
Rule based MT (2)
n There are three different types of rule-based machine
translation systems:
¨ Direct Systems (Dictionary Based Machine Translation)
n Map input to output with basic rules.
¨ Transfer RBMT Systems (Transfer Based Machine Translation)
n Employ morphological and syntactical analysis.
¨ Interlingua RBMT Systems (Interlingua)
n Use an abstract meaning.
Direct RBMT
n Based on a large bilingual dictionary
¨ Simple reordering rules can apply
¨ Moving adjectives after nouns
n The process of direct translation involves:
¨ Shallow morphological analysis
¨ Lexical transfer, based on bilingual dictionary
¨ Local reordering
¨ Morphological generation
Abraham አብርሃም መስታወቱን
broke the ሰበረው
glass አበበ (3rd per. Mas. Sing)
አብርሃም (3rd per. Mas. Sing)
PAST (ስብር) መስኮት (Object)
ኡ/ው ኡ/ው
መስታወት PAST (ስብር)

Morphological Lexical reordering Morphological


analysis transfer generation
Abraham (3rd per. Mas. Sing) አበበ (3rd per. Mas. Sing)
broke (PAST of break) መስኮት [ኡ] [ን]
the (functional morpheme) ሰበር [ኧ] [ው]
Glass (Object)
Transfer RBMT
n Passes through three steps
¨ Analysis of the source language text to
determine its grammatical structure,
¨ Transfer of the resulting structure to a
structure suitable for generating text in the
target language, and
¨ Finally generation of this text. Transfer-
based MT systems are thus capable of using
knowledge of the source and target
languages.
Interlingual translation
n The process of interlingua translation involves analysis and
generation of the text from source to the target.
Statistical MT
n Finds the most probable target text sentence given a source text
sentence based on the noisy channel model.
¨The task is then to discover the hidden (target language)
sentence that generated the observation (source language)
sentence using statistical/probabilistic model
n The noisy channel model of statistical MT requires three
components; Language model (LM), Translation model (TM)
and decoder, which produces the most probable sentence.
SMT

Source Target
Source text
text text

translation
Target
model Decoder
translated text

Target language Language


data model
Rules vs Statistical
Rule-Based MT Statistical MT
+ Consistent and predictable quality – Unpredictable translation quality
n Given rule and corpus,
+ Out-of-domain translation quality – Poor out-of-domain quality
there is a clear need
+ Knows grammatical rules – Does not know grammar for a another approach
+ High performance and robustness – High CPU and disk space requirements through which users
+ Consistency between versions – Inconsistency between versions would reach better
translation quality and
– Lack of fluency + Good fluency
high performance,
– Hard to handle exceptions to rules + Good for catching exceptions to rules with less investment.
– High development and customization + Rapid and cost-effective development
costs costs provided the required corpus exists
Example based MT
n Given RMT and SMT, there is a clear need for a third approach
through which users would reach better translation quality and
high performance (similar to rule-based MT), with less
investment (similar to statistical MT).
MT challenges (1)
n Ambiguity
n Words and phrases in one language often map to multiple words in
another language e.g. bank
n Idiomatic usage – difficult to identify
¨ Breaking the ice.
n Structural ambiguity
¨ The man saw the girl with the telescope
MT challenges (2)
n Structural difference among languages
¨ Word order (SOV, SVO, etc)
n Lexical differences
¨ The word know in English refer to knowing of a fact or proposition,
or familiarity with a person or location. In French the words connaître
and savoir are used.
n Complexity of the language
n Language of resource
SMT tools
n Sentence alignment software
¨ Hunalign

n Word alignment software


¨ Mgiza

n Language modeling toolkit


¨ SRILM

n Decoder
¨ Moses and Pharaoh
MT evaluation (1)
n Evaluating the quality of MT is an extremely subjective task
n Translations are evaluated along:
¨ Fidelity
¨ Fluency
n Translations can be evaluated:
¨ Using human raters
¨ Automatically
MT evaluation (2)
n The most accurate evaluations use human raters to evaluate MT.
Fidelity Fluency
§ Measure adequacy § How intelligible, clear, readable, or natural
§ Translation contains the information the MT output is.
that existed in the original. § One method is to rating scale and other
§ Whether the information in the MT methods rely on conscious decision
output is sufficient to perform some maker.
task. § Measure the time it takes for the rates to
read each output sentence.
Measure the number of words, the amount of time, or the number of keystrokes required
for a human to correct the MT output to an acceptable level (post-editing)
MT evaluation (3)
n Human evaluation is time consuming, reference translation
must be prepared by human beings as we do not rely on a single
human translation
n Given output sentence in a test set, we compute the translation
closeness between the MT output and the human sentences
¨ MT output is ranked as better if on average it is closer to the human
translations
n MT evaluation metrics (BLEU, NIST, TER, precision and recall
and METEOR) differ on what counts as translation closeness
SMT resource (1)
SMT resource (2)

You might also like