Professional Documents
Culture Documents
Module 1 Lecture 8-1
Module 1 Lecture 8-1
CSE 243:
Natural Language Processing
Recap from the Previous Lecture
• Challenges in PBSMT
• Using transformers for MT
2
Contents
• Machine Translation Evaluation Metrics
3
Machine Translation Evaluation Metrics
• BLEU – Bilingual Evaluation Understudy
• METEOR – Metric for Evaluation of Translation with Explicit Ordering
• TER – Translation Error Rate
4
BLEU
• Most popular MT evaluation metric
• Requires only reference translations
• No additional resources required (unlike METEOR)
• Precision-oriented measure
• Useful to compare 2 systems
5
Precision
• Reference1: maine abhi khana khaya (I ate food now)
• Reference2: maine abhi bhojan kiyaa (I did meal now)
• Candidate1: maine ab khana khaya (I now food ate)
• Candidate2: maine abhi lunch ate (I ate lunch (OOV) now (OOV))
• Unigram precision:
• Candidate1 = ¾ = 0.75; Candidate2 = ½ = 0.5
• Bigram precision:
• Candidate1 = 0.33; Candidate2 = 0.33
6
Precision: Not good enough
• Reference: mujh par tera suroor chhaaya
• Candidate1: mera tera suroor chhaaya
• Candidate2: tera tera tera suroor
• Unigram precision:
• Candidate1 = ¾ = 0.75
• Candidate2 = 1.0!
7
Modified Precision
• Clip the total count of each candidate word with its maximum
reference count.
• Reference: mujh par tera suroor chhaaya
• Candidate2: tera tera tera suroor
• Modified Unigram Precision:
• Candidate2 = ½ = 0.5
8
Candidates Shorter than Reference
• Reference: Kya BLEU lambe vaakya ki guNvatta ko samajh paaega?
• Candidate: lambe vaakya
• MUP = 1
• MBP = 1
9
Incorporating Recall
• Sentence length indicates “best match”
• Brevity Penalty (BP):
• Candidate1: lambe vaakya
• Candidate2: Kya BLEU lambe vaakya ki guNvatta ko samajh paaega?
• , if c > r
• , if c < r
10
BLEU Score
• , where
• BLEU = BLEU Score
• BP = Brevity Penalty
• N = Maximum length of n-grams to be considered (usually 4)
• Pn = Modified Precision
• Wn = Weight given to the N-grams, such that the sum of all weights =
1
11
METEOR: Criticisms of BLEU
• Brevity penalty is not a good measure of recall
• Higher order n-grams may not indicate grammatical correctness in a
sentence
• Words with different meanings can get mismatched in BLEU.
• METEOR: Have a better unigram matching strategy.
12