Professional Documents
Culture Documents
Source Target
Source text
text text
translation
Target
model Decoder
translated text
n Decoder
¨ Moses and Pharaoh
MT evaluation (1)
n Evaluating the quality of MT is an extremely subjective task
n Translations are evaluated along:
¨ Fidelity
¨ Fluency
n Translations can be evaluated:
¨ Using human raters
¨ Automatically
MT evaluation (2)
n The most accurate evaluations use human raters to evaluate MT.
Fidelity Fluency
§ Measure adequacy § How intelligible, clear, readable, or natural
§ Translation contains the information the MT output is.
that existed in the original. § One method is to rating scale and other
§ Whether the information in the MT methods rely on conscious decision
output is sufficient to perform some maker.
task. § Measure the time it takes for the rates to
read each output sentence.
Measure the number of words, the amount of time, or the number of keystrokes required
for a human to correct the MT output to an acceptable level (post-editing)
MT evaluation (3)
n Human evaluation is time consuming, reference translation
must be prepared by human beings as we do not rely on a single
human translation
n Given output sentence in a test set, we compute the translation
closeness between the MT output and the human sentences
¨ MT output is ranked as better if on average it is closer to the human
translations
n MT evaluation metrics (BLEU, NIST, TER, precision and recall
and METEOR) differ on what counts as translation closeness
SMT resource (1)
SMT resource (2)