You are on page 1of 12

Module #1

CSE 243:
Natural Language Processing
Recap from the Previous Lecture
• Challenges in PBSMT
• Using transformers for MT

2
Contents
• Machine Translation Evaluation Metrics

3
Machine Translation Evaluation Metrics
• BLEU – Bilingual Evaluation Understudy
• METEOR – Metric for Evaluation of Translation with Explicit Ordering
• TER – Translation Error Rate

4
BLEU
• Most popular MT evaluation metric
• Requires only reference translations
• No additional resources required (unlike METEOR)
• Precision-oriented measure
• Useful to compare 2 systems

5
Precision
• Reference1: maine abhi khana khaya (I ate food now)
• Reference2: maine abhi bhojan kiyaa (I did meal now)
• Candidate1: maine ab khana khaya (I now food ate)
• Candidate2: maine abhi lunch ate (I ate lunch (OOV) now (OOV))
• Unigram precision:
• Candidate1 = ¾ = 0.75; Candidate2 = ½ = 0.5
• Bigram precision:
• Candidate1 = 0.33; Candidate2 = 0.33

6
Precision: Not good enough
• Reference: mujh par tera suroor chhaaya
• Candidate1: mera tera suroor chhaaya
• Candidate2: tera tera tera suroor
• Unigram precision:
• Candidate1 = ¾ = 0.75
• Candidate2 = 1.0!

7
Modified Precision
• Clip the total count of each candidate word with its maximum
reference count.
• Reference: mujh par tera suroor chhaaya
• Candidate2: tera tera tera suroor
• Modified Unigram Precision:
• Candidate2 = ½ = 0.5

8
Candidates Shorter than Reference
• Reference: Kya BLEU lambe vaakya ki guNvatta ko samajh paaega?
• Candidate: lambe vaakya
• MUP = 1
• MBP = 1

9
Incorporating Recall
• Sentence length indicates “best match”
• Brevity Penalty (BP):
• Candidate1: lambe vaakya
• Candidate2: Kya BLEU lambe vaakya ki guNvatta ko samajh paaega?
• , if c > r
• , if c < r

10
BLEU Score
• , where
• BLEU = BLEU Score
• BP = Brevity Penalty
• N = Maximum length of n-grams to be considered (usually 4)
• Pn = Modified Precision
• Wn = Weight given to the N-grams, such that the sum of all weights =
1

11
METEOR: Criticisms of BLEU
• Brevity penalty is not a good measure of recall
• Higher order n-grams may not indicate grammatical correctness in a
sentence
• Words with different meanings can get mismatched in BLEU.
• METEOR: Have a better unigram matching strategy.

12

You might also like