You are on page 1of 14

HUMAN TRANSLATION - LINGUISTIC

INSTRUCTIONS AND GUIDELINES


Contents
PROJECT SCOPE AND CONTENT ................................................................................................. 3
SCOPE .................................................................................................................................................. 3
CONTENT ............................................................................................................................................. 3
GENERAL QUALITY GUIDELINES ............................................................................................... 3
PROTOCOL FOR MANAGING PERFORMANCE ..................................................................................... 4
QUALITY GROUPS................................................................................................................................ 4
ERRORS SPOTTED BY CUSTOMER ....................................................................................................... 4
QUALITY CRITERIA ......................................................................................................................... 4
ACCURACY ........................................................................................................................................... 4
LANGUAGE MECHANICS ..................................................................................................................... 5
LANGUAGE NATURALNESS AND HUMAN FINAL QUALITY .................................................................. 5
GENERAL TRANSLATION GUIDELINES .................................................................................... 5
TRANSLATION SETS ............................................................................................................................. 5
MEANING ............................................................................................................................................. 6
SECOND TRANSLATION ....................................................................................................................... 6
CONSISTENT FORMAT ......................................................................................................................... 6
SYNONYMS/DIFFERENT MEANINGS/POLYSEMIC WORDS .................................................................. 7
UNCLEAR TEXT .................................................................................................................................... 7
LACK OF CONTEXT/UNRELATED SENTENCES ...................................................................................... 7
SENTENCES THAT MAKE NO SENSE .................................................................................................... 8
INCOMPLETE STRINGS ........................................................................................................................ 8
ERRORS IN SOURCE ............................................................................................................................. 8
OTHER LANGUAGE IN SOURCE ........................................................................................................... 8
INCIDENTAL WORDS ........................................................................................................................... 8
STYLE AND FORMALITY ................................................................................................................ 9
GENERAL STYLE ................................................................................................................................... 9
POLITE WORDS.................................................................................................................................... 9
FORM OF ADDRESS (FORMAL OR INFORMAL).................................................................................... 9
SWEAR WORDS/POLITICALLY INCORRECT EXPRESSIONS/VULGAR CONTENT ................................... 9
GENDER ............................................................................................................................................. 10
NAMES AND NUMBERS ................................................................................................................. 10

1
NUMBER/DATE/TIME/SYMBOLS/UNITS ........................................................................................... 10
BUSINESS NAMES/PRODUCT NAMES ............................................................................................... 11
PROPER NAMES ................................................................................................................................ 11
APP NAMES/PODCASTS/SONG TITLES/OTHER MEDIA ..................................................................... 11
NAME OF LAWS, ACTS, ORGANIZATION NAMES, ETC. ..................................................................... 12
PUNCTUATION AND CAPITALIZATION ................................................................................... 12
PUNCTUATION .................................................................................................................................. 12
EMOJIS .............................................................................................................................................. 12
CAPITALIZATION................................................................................................................................ 13
MENTIONS AND TAGS ....................................................................................................................... 13
AUTOMATED QA CHECKS (ONEFORMA) ................................................................................ 13
VERSION UPDATES ......................................................................................................................... 14

2
PROJECT SCOPE AND CONTENT
SCOPE

These translations are aimed to populate MT engine for ISAAC, that means, to produce
automatic translations of future texts. The ISAAC project entails the direct translation (as
opposed to adaptation, localization or transcreation) of content into multiple languages, with
HO’s as per the agreed schedule. Final quality expected: HUMAN QUALITY

Translations will be provided by native speakers of target language who are fluent in source
language and will be accurate and in compliance with the project translation Guidelines
below.

No Machine Translation is allowed in the translation: Human translation is always


required. It is forbidden to use Google Machine Translation, Microsoft Machine or any other
MT engine translation. This requirement is essential as the translated data will be used for
MT engine training, feeding MT materials won’t provide the expected results.

Pactera will conduct “Edit distance” comparison against the main MT engines in each string.
If there is an indication that MT might have been used, the translated string will be failed. In
case of repeated failed strings, we will apply the protocol for managing underperformance, as
described in the section below.

CONTENT

Content is general news, newspapers, IMs or comments in social networks, strings extracted
from news programs, medical publications, finance and business documents, or concerning
sports, travel, shopping, etc. There is no further reference or context than the file itself,
translators must translate to the best of their knowledge.

GENERAL QUALITY GUIDELINES

Standard quality checks will be performed by the Pactera EDGE QA team. The goal is to
monitor the general supplier quality and to identify poor performers for retraining or removal.
Our client goal is to achieve 95% of Pass hits (no corrections required).

QA of QAers work:
 Pactera EDGE will perform QA on top of QAd strings – Same quality standards and
protocols will apply to QArs for their work.

3
PROTOCOL FOR MANAGING PERFORMANCE

For the first 100 reviewed hits per translator, no penalty would be applied in the PO. Then:

Quality Score Discount in the PO


>=80% None
60 – 79.99% 30% discount
<60% 60% discount

After 3 attempts with repeated <60% score, the supplier will be removed from the project.

QUALITY GROUPS

According to each supplier average quality score, there will be 3 different Quality groups:
 Gold: average quality score >=95%
 Silver: average quality score between 80 - 94.99%
 Bronze: average quality score <80%

Gold suppliers will have higher priority during job assignments over the rest.

ERRORS SPOTTED BY CUSTOMER

If Pactera EDGE team receives feedback from client including corrections to errors made by
the translator, translator will implement those corrections at no extra costs.

If Pactera EDGE team receives feedback from client including corrections to errors not
spotted by QA resource, the QA resource will implement those corrections at no extra costs.

QUALITY CRITERIA

Quality management will be focused to the following aspects:


• Accuracy
• Language Mechanics
• Language naturalness and human final quality

ACCURACY

4
Translated text represents the meaning of the source text concepts and conveys them
correctly.

1-to-1 correspondence will exist between the source and target texts, all target strings are
required to accurately convey the meaning of the corresponding source string.

Retain the meaning of the source string in translation. A literal word-by-word translation is
not acceptable if the meaning of the source string gets lost. In this context, idioms such as “he
was beating around the bush” or “I’m just pulling your leg” can be especially challenging to
translate, and a literal word-by-word translation is not acceptable.

Provide two correct but different translation: Translation1 should be the most common one.
Make every attempt to provide a good diversity in second translation and not just change the
articles or other minor changes.

LANGUAGE MECHANICS

Language based on standard grammar, syntax rules and conventions of the target language.
Correct spelling and punctuation. Absence of typing errors.

LANGUAGE NATURALNESS AND HUMAN FINAL QUALITY

Language will be as natural and fluent as if written by a native.

MT translation is forbidden in all cases. This will be checked by Pactera against Google MT
and Microsoft MT and a penalty will be applied in case MT has been used.

GENERAL TRANSLATION GUIDELINES


TRANSLATION SETS

Some projects may require 1 translation, others will require 2. For those who require 2, 2
translations will be mandatory for the same source and in all the strings of the task. Project
managers will inform the translation team when this is required and for these cases the source
string will be paid twice as well. For projects that require 2 translations, please consider that:

• Both translations need to maintain the original intent by using different words,
structures, and expressions (you cannot just change a synonym, it should really be
2 different ways of expressing the same)
• Translation 1 needs to be the most common one. Translation 2 should not consist
on simply changing the order of the words or articles, or using numbers instead of
letters, or abbreviations, etc. You should express the same intent of the source
sentence in a different way than translation 1. Example:

5
o translation1: I'm Ralf
o translation2: my name is Ralf

Example of reformulation for Spanish:


o Source: I wanted to see if you would like to go see "American Beauty"
with us, sometime soon
o translation1: Quería saber si te gustaría ir a ver “American Beauty” con
nosotros un día de estos.
o translation2: ¿Te apetecería ir a ver con nosotros “American Beauty” algún
día?

• For those extreme/rare cases where a second translation is really not possible (for
example, when source string contains only a URL), it can be omitted but you
should provide a justifiable reason that we will have to submit to the customer.
Please check HERE for hints and tips on how to provide valid 2nd translations.

MEANING
SECOND TRANSLATION

Make every attempt to provide a good diversity in second translation and not just
change the articles or other minor changes. e.g. translation to English, you are → you're
or The fox → A fox, etc.

This diversity can come from using a different level of formality (see also section “Style
and formality” below), translate into two sentences with different meanings (if the source
text is ambiguous), use synonyms in the second translation or rephrase the sentence
(syntactic difference). Please take all these possibilities for providing diversity into
account and combine them. Do not always use the same criteria (for example do not
always rephrase).

• translation1: “Marca is really dumb”, translation2: “Marca is really stupid”


• translation1: “I wish you all the best on your birthday”, translation2: “wishing
you a happy birthday”

Don't try to add a new translation from a different variant such as Swiss German for German.

• Christmas tree → 圣诞树 (good)


• Christmas tree → 耶诞树 (bad, this translation is only used in Taiwan or Hong
Kong)

Please check HERE for hints and tips on how to provide valid 2nd translations.

CONSISTENT FORMAT

6
The format of the text should be consistent across source and target languages unless target
language has a pre-defined syntax. See also the section “Names and numbers”. Example:
“123” should be translated as “123” and not as “one hundred and twenty-three”

SYNONYMS/DIFFERENT MEANINGS/POLYSEMIC WORDS

An English word may be a verb or noun in the target language and/or has different
meanings. Try to use the most common meaning(s) if a source word / phrase has
multiple meanings.

• Rat → ⼩⼈ / 叛徒: Both of these translations correspond to the less common


meaning 'traitor' instead of the more common 'animal' translation

If there is ambiguity, please choose the most common translation. Example: “America's Got
Talent” → translate as the TV show and not the phrase itself

The original source sentence is given to provide context of a given payload.

Keep the same meaning and use the original source sentence as context. Example: depending
on the context a bag in German can be “eine Tasche” or “eine Tüte” (a paper/plastic bag) →
please use the translation expected from the context

UNCLEAR TEXT

If the source meaning is clear (despite minor grammatical/spelling issues), we’d appreciate
a translation. If something like ‘yo-yo check ‘em out’ is not clear, please raise a query.

LACK OF CONTEXT/UNRELATED SENTENCES

The main goal of this client is to feed test data to speech-to-text engines, so the main goal of
these translations is accuracy in the meaning, rather than any specific style indications. You
should expect a lot of isolated strings. Again, as the main goal is to improve the voice-to-
text or image-to-text recognition features, the client extracts isolated strings from a very
broad scope of sources.

The client mixes very different sources. Therefore, please focus on each string individually,
and don’t try to find how the strings are related with one another; probably, they are not
related at all. If you get stuck with any of them or a string is too ambiguous, in many cases a
search for the exact snippets in Google can lead you to the article or whole conversation,
which might be useful for context. If that’s not the case, try performing several searches to
find a possible context or meaning.

7
SENTENCES THAT MAKE NO SENSE

When the source text makes no sense just translate literally and to the best of your
knowledge. Word by word if needed.

INCOMPLETE STRINGS

If a string is incomplete in source, don’t try to complete it in target; translate the part that
appears in source with what would be the equivalent part in your language. Don’t try to
guess what the rest of the string would be.

ERRORS IN SOURCE

The source data will likely have misspellings and abbreviations or grammar mistakes. Leave
the source text as is, but fix the translation so that it doesn't contain any linguistic or grammar
issue. Also disambiguate abbreviations (Ex. 'how r u -> cómo estás').

Double check the translation for spelling errors / grammatical issues.

• Wilkommen → Willkommen ('welcome' in German)


• 길 건너 편 → 길 건너편 ('across the street' in Korean)

OTHER LANGUAGE IN SOURCE

If the input text is completely in another language (not the announced source
language) please skip this sentence (do not translate).

INCIDENTAL WORDS

When incidental words of cultural / linguistic courtesy or habit are generally used
in the source language but not the target language, or generally used in the target
language but not the source language, then the human professional translator may
add or delete such words in order to best convey the same meaning and tone. For
example, when translating a Source String that discusses a future event from
Arabic into English, it may sometimes more closely convey the meaning and tone
to delete the word “inshallah” rather than literally translating it.

8
STYLE AND FORMALITY
GENERAL STYLE

Translate as if you are writing a Tweet - informal by default.

Some languages (e.g., French, Spanish, German) have some levels of formality based on the
audience (politeness). If the source sentence contains a level of formality, please try to carry
over the formality/politeness level from source to target language. If the source sentence
does not provide a level of formality (e.g. English) please translate with an informal style by
default. Polite/Formal language can be used only in cases where it is more appropriate for
the given source sentence in the specific context.

Do not use the level of formality which would be unnatural in the context of the sentence.
For example, use informal language for vulgar or offensive contents. Example: “I love
your ass” —> translate it in an informal way, for example “deinen” instead of “Ihren” in
German. In some languages, formal is the more accepted form, e.g Japanese, in that case,
go with the more accepted form of language.

Provide more a translation of the meaning rather than a “word by word” style
translation.

For very common terms in English → Provide the best grammatical translation

POLITE WORDS

If using ‘polite’ words, e.g., 'please' is not common or expected, it’s fine to omit them.

FORM OF ADDRESS (FORMAL OR INFORMAL)

This material comes from very different sources therefore no single approach for the form of
address can be used. Remember that this material will go to populate a MT engine so there is
no need for a consistency of that type as in a normal translation. You’ll have to decide based
on the type of sentence, and maybe by the name of the file.

SWEAR WORDS/POLITICALLY INCORRECT EXPRESSIONS/VULGAR


CONTENT

Do not omit any parts of the source content and do not change the original tone of the
sentence. If source contains swear words, politically incorrect expressions, vulgar comments,
etc. please keep it in target as well by conveying the same meaning and tone.

9
GENDER

If the source sentence has a specific gender, it should be translated into the same gender in
the target language.

If the source language doesn't support a specific gender and you need to choose one in the
target language, for e.g pronouns in English → German/Spanish, choose a random gender.
Don't always choose male or female.

• I'm glad → “estoy contento”

NAMES AND NUMBERS


NUMBER/DATE/TIME/SYMBOLS/UNITS

Please keep numbers as is (keep the “same style”). That means written-out numbers in the
source sentence should be translated into written-out numbers in the target language and
Arabic numerals in the source sentence can be transferred as Arabic numerals to the target
sentence. ‘Two’ should be ‘two’ in the target language. Please stick to this rule even if it
might be unusual/rare to use written-out numbers instead of Arabic numerals in the target
language.

• Example: 123 → 123


• Example: two plus two equals four —> zwei plus zwei ist gleich vier

Any formatting of the numbers, for e.g thousand separators can be modified to fit the target
language.

• English → German, “123,000” should be translated to “123.000”.

Dates can be formatted based on the target language but remaining close to the source
format as possible.

• English → French, “02/10/2015” should be translated to “10/02/2015”.

Symbols can be converted.

• Example: $125 dinner → 125 $ Abendessen / 125 USD Abendessen

Currency names should be translated. Remember that the purpose is to populate ISAAC to
produce automatic translations, so they need the equivalents in your language for all these
currency names. Examples:

• EN = Eighteen DOLLARS, and that's my final offer.


Spanish = Dieciocho DÓLARES, esa es mi oferta final.
• EN = 8000 crowns
Spanish = 8000 coronas

10
Don't convert units even if the target language uses different metric systems. There is no need
to convert units like inches → cm, temperature (F → C), et al.

• Example: 6 inches should not change to 15.24 cm.


• Example: 75 Fahrenheit should not change into 25 Celsius (75 F should be
translated to 75 F).

BUSINESS NAMES/PRODUCT NAMES

Business names and product names should not be translated unless the company or product
actually uses a different name in the target language; nor should the capitalization be changed
from the company’s accepted capitalization even if the company’s accepted capitalization
violates the normal capitalization rules of the target language. For example, Skype remains
Skype and eBay remains eBay, but “Diet Coke” would be translated as “Coca-Cola Light”
for those countries where the soft drink product is marketed and sold under that name

PROPER NAMES

Proper names of people should NOT be changed, otherwise, the engine might end up
understanding that “Lauren” always needs to be translated for instance into “María” in
Spanish and into “Sophie” in French, etc., which would be wrong. Proper names of people
(unless of course the historical names widely recognized in each language, e.g. Columbus >
Colón) are not expected to be adapted to local variants.

TRANSLITERATION can be used as long as the names are the same as in English, like in
this example for Serbian: "Shakespeare" > "Šekspir".

Other names (e.g. names of places and languages) should be translated when the name is
rendered differently in the source and target languages. For example, Warsaw in Polish
should be Warszawa, Cologne in German should be Köln and Spain in Spanish should be
España.

APP NAMES/PODCASTS/SONG TITLES/OTHER MEDIA

When the Source String contains the title of a television show, movie, song, book, or other
title, an accurate Translation should retain the intent and meaning of the title as nearly as
possible in the target language, whether by using the title of the same work in the target
language (even if not correct as a word-for-word translation); by translating the words of the
title; or, when appropriate, by not translating the title at all. For example, “Harry Potter and
the Philosopher’s Stone” should be translated into French as “Harry Potter à l'école des
sorciers” because that was the title of the French version of the same book; and “twinkle,
twinkle, little star” might be translated into Korean using the corresponding Korean-language
song “반짝 반짝 작은 별”.

11
NAME OF LAWS, ACTS, ORGANIZATION NAMES, ETC.

Please keep in English (or in any other language the original name is presented) the laws,
acts, organization names, etc. that belong to a specific country.

Example: 'Defense of Marriage Act'

The current administration considers the Defense of Marriage Act to be unconstitutional and
discriminatory and it should not be enforced except when it comes to money.

For international organizations or international documents, i.e., 'Universal Declaration of


Human Rights', please use the standard and approved translations for your language.

PUNCTUATION AND CAPITALIZATION


PUNCTUATION

Preserve all punctuation as much as possible. If the source sentence includes several question
marks, explanation marks, dots etc. (pretty common in this data set) please copy this
punctuation as is into the target. Example: “@BetoORourke Well bless her heart…..” (copy
all dots at the end of the sentence into the target). PACTERA UPDATE Because Isaac
instruction is vague, we have received tons of queries and doubts about this. After checking
many cases and analyse different scenarios, we have decided to add the below additional
clarifications:

- Due the nature of the instruction provided by the customer, if the source sentence is
missing a final dot at the end, it’s ok to omit the final dot in translation as well.
Reviewers can choose to correct it in the review stage in order to make it
grammatically correct if they wish, but this should not be marked as an error. You
should instead select the "Preferential change" option which does not add any penalty
and does not affect the translator QA score.

EMOJIS

Please copy over emojis directly into the translations.

• Example: ‘I love this game ’ -> ‘me encanta este juego ’


• Example: ‘pretty fun :)’ -> ‘muy divertido :)’

If the source sentence uses an emoji to replace a word that is essential to the sentence
meaning, then translate with the inferred word.

• Example: ’My is red' should be translated as if the source were 'My

12
house is red’

If the source sentence uses an emoji to replace a word that is essential to the sentence
meaning, then translate with the inferred word.

• Example: ‘Wer Lust hat soll mir einfach schreiben :-‘ —> ‘If you feel like it,
just write me :-‘

CAPITALIZATION

Please preserve as appropriate. PACTERA UPDATE Because Isaac instruction is very


vague, we have received tons of queries and doubts about this. After checking many cases
and analyse different scenarios, we have decided to go with the below guidelines:

- If the source capitalization (for example the first word of the sentence being
lowercase instead of uppercase) does not have any impact on the meaning and the
sentence can be equally understood from the user perspective, it's ok to follow it
(leaving the initial word lowercase for example). Reviewers can choose to correct it
in the review stage in order to make it grammatically correct if they wish, but this
should not be marked as an error. You should instead select the "Preferential change"
option which does not add any penalty and does not affect the translator QA score.

MENTIONS AND TAGS

Preserve @mentions and #hashtags

• I don't like @PapaJohn pizza #cardboard #evil → no me gusta la pizza


de @PapaJohn #cardboard #evil

AUTOMATED QA CHECKS (ONEFORMA)


The following checks are run automatically on all strings. Some of them are done "on the
go", which means a pop-up message comes up before you can finally submit the string and
you will be allowed to make fixes accordingly. Some others will be performed after you have
submitted the translation and will determine if the string will be double checked by another
QAer or not. We may decide to add/remove some checks during the project according to
customer feedback and specific project needs.

• Spelling (this is browser based, so if you do not have the spell checker activated
in your browser, please activate it and/or make sure to install a specific add-in for
your language to check the spelling while you type)
• Missing or Extra numbers in target
• Trailing and Multiple Spaces

13
• MT check: automated check against Google MT and Microsoft MT (inactive for
strings of less than 10 words)
• Hashtags
• Mentions
• Emojis
• Translation1 same as Source
• Translation1 same as Translation2
• Translation1 empty

VERSION UPDATES
Date Version Author Comment
15/12/2020 1.7 Selvaggia Cerquetti Removed the below sentence from the
Accuracy paragraph: Please provide
translation 2 only if it is fairly common.
If there is no fairly common translation,
please omit the second translation.

14

You might also like