Professional Documents
Culture Documents
Style Guide Annotation AiData Guidelines - 20231030
Style Guide Annotation AiData Guidelines - 20231030
Upd:20231030
Context
A localization style guide is a set of rules that a translator is expected to follow when translating particular content into a particular
language. Style guide instructions written in a natural language can be diverse, very specific and highly contextual (domain, brand,
target audience). The translator uses high judgment to decide which and how many instructions apply to a particular input given the
context.
We are working on a feature that allows us to “read” and parse localization style guides that are written in a natural language.
Goal
We have created a list of “generic” (standardized) categories that are usually covered in localization style guides (voice/tone, locale
time/date/currency formats, etc.). With this annotation, we are looking to get the content extracted from actual style guides and
mapped to these generic categories along with some additional information. We are interested in the instructions that reflect the
specific style requested by a client.
Verbatim_category: a title of a section in the actual style guide that you are annotating, copied through; these titles are not
standardized and vary throughout style guides (e.g., Voice/Tone/Our voice/Voice and Tone/Writing style/etc.).
Note: we do NOT change the wording in this category; it must be copied through as is.
Standardized_category: a generic category name provided by AiData that needs to be matched with the verbatim_category; you will
determine the appropriate category for each section (e.g., our standardized category “Voice” will match with “Brand’s Voice and
Tone” section in a style guide)
Attribute: a sub-category that you need to identify and annotate; the list of attributes to look for will be provided to you (e.g.,
capitalization, acronyms, date format, time format, etc.). There are often multiple attributes under a larger category/section.
we are only interested in the attributes that can vary from client to client, from style guide to style guide (e.g., voice and tone
rules, capitalization rules specific to content, handling date/time formats, translating acronyms, etc.)
o Voice category always varies and should always be annotated.
o Localizing Formats category would most likely be annotated as well.
o whenever there is an option/choice, annotate it.
e.g., localizing a date format: even if you always localize into your locales, there can still be more than one
format to choose from: all digits, months spelled out, using a hyphen or a slash, etc. This means, annotate it.
the attributes that follow standard linguistic conventions of a given locale and are applicable to all content/all Clients need
not be annotated (standard spelling/syntax rules, etc.). This will be your judgement. Please check with us if in doubt.
o We expect that the universal rules will be mainly under Linguistic Conventions (typically spelling and punctuation
rules, etc.).
Value: the value of an attribute that you will be looking for and annotating; the list will be provided to you (e.g., attribute
“capitalization” has two values: lower case and upper case).
! This list may grow; if you think there is a suitable value that is missing from our template, please annotate it under Value, as usual,
AND under a designated column “Proposed_Value”.
Rule/Condition: text from the SG phrased as a rule, or a condition under a rule, that describes the correct use of a value. E.g., “Use
upper case with all model names”; “Use uppercase except for the models that start with the word <auto>”.
! We do NOT rephrase/edit the text; it must be copied through as is.
Examples:
1. All examples illustrating the Rule/Condition that you need to extract from the SG (if those are available);
2. two additional examples for each Rule/Condition that you will generate from scratch.
Workflow
1. Familiarize yourself with the provided list of attributes and standardized categories.
2. Start reading the Style Guide (SG)
3. For each mention or reference to the attribute or value from our list, follow the annotation steps below.
4. Skip the sections to which the list does not apply.
Annotation
Step 1.
Start reading the SG from the top. Usually, style guides start with Voice and Tone sections, but it can start with any section.
The main focus of Step 1 is looking for the attributes and/or values from our list.
We want to capture every instance of our attributes and values getting referenced in a style guide.
Use our list to identify the values to look for under each attribute.
Do not search for exact match in wording. Instead, you should be looking for a reference to the attributes and values from
our list.
These values may be mentioned directly as they are in our list, or phrased differently, or sometimes only implied. You should
be looking for text that conveys the meaning of the attribute or value in this particular style guide.
o e.g., it may be straightforward like “Use uppercase with titles” or be phrased as an exception, like “Use lowercase
everywhere except for titles”.
o e.g., you may not find the words “friendly” or “informal” literally mentioned anywhere; instead, you will find “Use a
less formal tone”/ “We like to sound casual”/“Our customers should be addressed in a warm way”/etc. The idea
behind these is the same: register - informal, general voice - friendly.
Note that there are several aspects of tone and voice: here are some examples of different ways to refer to those.
Note that these values can come in any order. The actual order of the values appearing in the SG will not match the order in
which they appear in our list.
If the sentence does not sound like a rule/instruction but rather as a general statement or a non-specific/non-limiting
instruction - ignore it. E.g.,
o “Present participle can be used as an adjective or an adjectival sentence.” (general statement)
o “Punctuation can be used as means of communicating tone.” (general statement)
o “be mindful of using exclamation points in your translation as the content is directed at customers.” (non-specific
instruction: not clear how to proceed)
Once you identified the first reference to a value from our list, move on to Step 2.
Step 2.
Each annotation step should be applied on a sentence level, and the sentence can be further split per value if needed.
For Bilingual Style Guides: preserve all text in the source/target language as it appears in the SG.
Tip: we find it easier to start with entering the Rule/Condition first (after you identified the value) and then start annotating
around it.
Example we used below is taken from a Style Guide. The sentence was annotated because it contains references to capitalization:
“Use lowercase after a colon within a sentence, except if the colon introduces a direct quotation or if the first word after the colon is
a proper noun.”. We split it into two parts based on the values that are contained in the sentence: lowercase (explicit) and
uppercase (implied).
* - this field should always be completed; others are based on the availability/presence in the SG.
It is possible that an Attribute from one category may serve an additional purpose of another category in the actual SG; in
this case, please still do map it to the “original” category from our list.
o e.g., “punctuation” is listed under Linguistic Conventions, but you may notice that in a Style Guide it has an implied
message for Voice: “Never use an exclamation point when translating comments directed at customers; we do not
want to sound as if we were yelling.” ← still, this is a rule about Punctuation and belongs to Linguistic Conventions.
some Values will appear in the form of a chart, list or glossary. Usually, such lists provide exact instructions on words to use
or words to avoid. In such cases,
o there will be no Value to annotate
o place all indicated words/terminology under separate examples
o these examples may have either “correct” or “incorrect” usage available:
e.g. there can be “use this/not this” chart, in which case you will have both correct and incorrect examples.
Or, you may only see “preferred word choices” or “approved terminology” lists, in which case you will
annotate: Source - N/A, Correct target: word to use; Incorrect target: N/A.
Linguistic Conventions: *linguistic conventions that should only be annotated if they deviate from the standard language
conventions and are specific to this particular SG or client. This note is added to the attributes that may be standard across
the entire locale, but sometimes may be used to express Client’s voice/tone or specific style.
o you should only annotate such Attributes if they serve the Client’s style.
o this will be your judgement; you will rely on your expert knowledge of the language to identify “standard” linguistic
instructions from a deviation. E.g.,
“Capitalize proper nouns” - standard convention in English; do not annotate
“Capitalize proper nouns related to our product but lowercase all models and makes of a direct competitor” -
this is specific to a Client and deviates from the standard convention, which is to capitalize all proper nouns.
Annotate this attribute and rule.
Localizing Formats: this section is for localizing content. Note that most of the Attributed have only 2 values: localize or
preserve. Once you identify which of the two is required, all other aspects of these attributes (such as spacing, placement of
the currency symbol, truncating the name of the month, etc.) will be annotated as rule/condition under that value.