Bleu

BLEU (Bilingual Evaluation Understudy) Score:
BLEU score is a widely used metric for machine translation tasks,

where the goal is to automatically translate text from one language
to another. It was proposed as a way to assess the quality of
machine-generated translations by comparing them to a set of
reference translations provided by human translators.
How does BLEU score work?
BLEU score measures the similarity between the machine-translated

text and the reference translations using n-grams, which are
contiguous sequences of n words. The most common n-grams used
are unigrams (single words), bigrams (two-word sequences),
trigrams (three-word sequences), and so on.
BLEU score calculates the precision of n-grams in the machine-

generated translation by comparing them to the reference
translations. The precision is then modified by a brevity penalty to
account for translations that are shorter than the reference
translations.
The formula for BLEU score is as follows:
BLEU = BP * exp(∑ pn)
Where:
 BP (Brevity Penalty) is a penalty term that adjusts the score
for translations that are shorter than the reference
translations. It is calculated as min(1, (reference_length /
translated_length)), where reference_length is the total
number of words in the reference translations, and
translated_length is the total number of words in the
machine-generated translation.
 pn is the precision of n-grams, which is calculated as the
number of n-grams that appear in both the machine-
generated translation and the reference translations
divided by the total number of n-grams in the machine-
generated translation.
BLEU score ranges from 0 to 1, with higher values indicating better

translation quality. A perfect translation would have a BLEU score of
1, while a completely incorrect translation would have a BLEU score
of 0.
Significance of BLEU score:
BLEU score is widely used in machine translation tasks as it

provides a simple and effective way to assess the quality of machine-
generated translations compared to reference translations. It is easy
to calculate and interpret, making it a popular choice for evaluating
machine translation models. However, it has some limitations.
BLEU score heavily relies on n-grams and may not capture the
overall meaning or fluency of the translated text accurately. It may
also penalize translations that are longer than the reference
translations, which can be unfair in some cases.
Sample code
import openai
from nltk.translate.bleu_score import sentence_bleu,
SmoothingFunction
# Set your OpenAI API key

openai.api_key = 'sk-
TivNhfDB7IjUmKNVAy5NT3BlbkFJxvJzyIJg6OjzIQh4OFfA'
def generate_translation(prompt):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful
assistant."},
{"role": "user", "content": prompt},
]
)
return response['choices'][0]['message']['content'].strip()
def calculate_bleu_score(reference, candidate, weights=(0.25, 0.25,

0.25, 0.25)):
return sentence_bleu([reference], candidate, weights=weights,
smoothing_function=SmoothingFunction().method1)
def main():
# Example source sentences
reference = [
'this is a dog'.split(),
'it is dog'.split(),
'dog it is'.split(),
'a dog, it is'.split()
]
# Example prompt for translation

prompt = f"Translate the following sentences : {reference}"
# Generate translation using GPT-3.5-turbo

candidate_translation = generate_translation(prompt).split()
# Convert lists to strings
reference_str = ' '.join(' '.join(sent) for sent in reference)
candidate_str = ' '.join(candidate_translation)
# Print the generated translation

print("Generated Translation:", candidate_translation)
# BLEU Score calculation

bleu_score = calculate_bleu_score(reference_str, candidate_str)
print(f"BLEU Score: {bleu_score}")
if __name__ == "__main__":
main()
Output:
Generated Translation: ["[['ceci',", "'est',", "'un',",

"'chien'],", "['c\\'est',", "'un',", "'chien'],", "['chien',",
"'c\\'est',", "'il'],", "['un',", "'chien,',", "'c\\'est',",
"'il']]"]
BLEU Score: 0.007427433865067654
Links:
1. https://github.com/y33-j3T/Coursera-Deep-Learning/
blob/master/Natural%20Language%20Processing
%20with%20Attention%20Models/Week%201%20-
%20Neural%20Machine%20Translation/
C4_W1_Ungraded_Lab_Bleu_Score.ipynb
2. https://machinelearningmastery.com/calculate-bleu-
score-for-text-python/
3.

Bleu

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bleu

Uploaded by

Copyright:

Available Formats

BLEU (Bilingual Evaluation Understudy) Score:

BLEU score is a widely used metric for machine translation tasks,

How does BLEU score work?

BLEU score measures the similarity between the machine-translated

BLEU score calculates the precision of n-grams in the machine-

The formula for BLEU score is as follows:

BLEU = BP * exp(∑ pn)

BLEU score ranges from 0 to 1, with higher values indicating better

Significance of BLEU score:

BLEU score is widely used in machine translation tasks as it

# Set your OpenAI API key

def calculate_bleu_score(reference, candidate, weights=(0.25, 0.25,

# Example prompt for translation

# Generate translation using GPT-3.5-turbo

# Print the generated translation

# BLEU Score calculation

Generated Translation: ["[['ceci',", "'est',", "'un',",

BLEU Score: 0.007427433865067654

You might also like