You are on page 1of 10

Style Guide Annotation AiData guidelines

Upd:20231030

Context
A localization style guide is a set of rules that a translator is expected to follow when translating particular content into a particular
language. Style guide instructions written in a natural language can be diverse, very specific and highly contextual (domain, brand,
target audience). The translator uses high judgment to decide which and how many instructions apply to a particular input given the
context.

We are working on a feature that allows us to “read” and parse localization style guides that are written in a natural language.

In this task, we need to create a dataset consisting of:


 style guide instructions
o extracted from original document
o annotated/categorized
 corresponding examples of correct and incorrect translations
o extracted from original document, if available
o generated from scratch

Goal

We have created a list of “generic” (standardized) categories that are usually covered in localization style guides (voice/tone, locale
time/date/currency formats, etc.). With this annotation, we are looking to get the content extracted from actual style guides and
mapped to these generic categories along with some additional information. We are interested in the instructions that reflect the
specific style requested by a client.

Format: JSON/CVS, filename includes the LP


Terminology used

Style guide (SG): a document that you are annotating

Verbatim_category: a title of a section in the actual style guide that you are annotating, copied through; these titles are not
standardized and vary throughout style guides (e.g., Voice/Tone/Our voice/Voice and Tone/Writing style/etc.).
Note: we do NOT change the wording in this category; it must be copied through as is.

Standardized_category: a generic category name provided by AiData that needs to be matched with the verbatim_category; you will
determine the appropriate category for each section (e.g., our standardized category “Voice” will match with “Brand’s Voice and
Tone” section in a style guide)

Attribute: a sub-category that you need to identify and annotate; the list of attributes to look for will be provided to you (e.g.,
capitalization, acronyms, date format, time format, etc.). There are often multiple attributes under a larger category/section.
 we are only interested in the attributes that can vary from client to client, from style guide to style guide (e.g., voice and tone
rules, capitalization rules specific to content, handling date/time formats, translating acronyms, etc.)
o Voice category always varies and should always be annotated.
o Localizing Formats category would most likely be annotated as well.
o whenever there is an option/choice, annotate it.
 e.g., localizing a date format: even if you always localize into your locales, there can still be more than one
format to choose from: all digits, months spelled out, using a hyphen or a slash, etc. This means, annotate it.
 the attributes that follow standard linguistic conventions of a given locale and are applicable to all content/all Clients need
not be annotated (standard spelling/syntax rules, etc.). This will be your judgement. Please check with us if in doubt.
o We expect that the universal rules will be mainly under Linguistic Conventions (typically spelling and punctuation
rules, etc.).

Value: the value of an attribute that you will be looking for and annotating; the list will be provided to you (e.g., attribute
“capitalization” has two values: lower case and upper case).
! This list may grow; if you think there is a suitable value that is missing from our template, please annotate it under Value, as usual,
AND under a designated column “Proposed_Value”.
Rule/Condition: text from the SG phrased as a rule, or a condition under a rule, that describes the correct use of a value. E.g., “Use
upper case with all model names”; “Use uppercase except for the models that start with the word <auto>”.
! We do NOT rephrase/edit the text; it must be copied through as is.

Examples:
1. All examples illustrating the Rule/Condition that you need to extract from the SG (if those are available);
2. two additional examples for each Rule/Condition that you will generate from scratch.

Workflow

1. Familiarize yourself with the provided list of attributes and standardized categories.
2. Start reading the Style Guide (SG)
3. For each mention or reference to the attribute or value from our list, follow the annotation steps below.
4. Skip the sections to which the list does not apply.

Annotation

Step 1.
Start reading the SG from the top. Usually, style guides start with Voice and Tone sections, but it can start with any section.

The main focus of Step 1 is looking for the attributes and/or values from our list.

We want to capture every instance of our attributes and values getting referenced in a style guide.
 Use our list to identify the values to look for under each attribute.
 Do not search for exact match in wording. Instead, you should be looking for a reference to the attributes and values from
our list.
 These values may be mentioned directly as they are in our list, or phrased differently, or sometimes only implied. You should
be looking for text that conveys the meaning of the attribute or value in this particular style guide.
o e.g., it may be straightforward like “Use uppercase with titles” or be phrased as an exception, like “Use lowercase
everywhere except for titles”.
o e.g., you may not find the words “friendly” or “informal” literally mentioned anywhere; instead, you will find “Use a
less formal tone”/ “We like to sound casual”/“Our customers should be addressed in a warm way”/etc. The idea
behind these is the same: register - informal, general voice - friendly.

Note that there are several aspects of tone and voice: here are some examples of different ways to refer to those.

general voice friendly clear techy inclusive


warm crisp detailed gender-neutral
conversationa plain detail- focused on people,
l oriented not disability
relaxed concise

 Note that these values can come in any order. The actual order of the values appearing in the SG will not match the order in
which they appear in our list.
 If the sentence does not sound like a rule/instruction but rather as a general statement or a non-specific/non-limiting
instruction - ignore it. E.g.,
o “Present participle can be used as an adjective or an adjectival sentence.” (general statement)
o “Punctuation can be used as means of communicating tone.” (general statement)
o “be mindful of using exclamation points in your translation as the content is directed at customers.” (non-specific
instruction: not clear how to proceed)

Once you identified the first reference to a value from our list, move on to Step 2.
Step 2.

Enter the following information into the annotation fields.

 Each annotation step should be applied on a sentence level, and the sentence can be further split per value if needed.
 For Bilingual Style Guides: preserve all text in the source/target language as it appears in the SG.
 Tip: we find it easier to start with entering the Rule/Condition first (after you identified the value) and then start annotating
around it.

Example we used below is taken from a Style Guide. The sentence was annotated because it contains references to capitalization:
“Use lowercase after a colon within a sentence, except if the colon introduces a direct quotation or if the first word after the colon is
a proper noun.”. We split it into two parts based on the values that are contained in the sentence: lowercase (explicit) and
uppercase (implied).

* - this field should always be completed; others are based on the availability/presence in the SG.

field definition example always notes


available?
*Attribute name of the attribute as it capitalization yes
appears in our list
Value name of the value as it lowercase no if multiple values are
appears in our list referenced in one
sentence, annotate each
value-Rule on a separate
row.

In this example, the next


row would have the value
"uppercase".
If not available, add N/A
*Verbatim_category_name the category name/section Language-specific yes Preserve the exact SG
title under which this text standards spelling
appears in the actual style
guide
*Standardized_category_nam the category name from Linguistic_convention yes Preserve the exact spelling
e our list that can be mapped s from our list.
to the verbatim_category
*Rule/Condition1 the sentence or part of the Use lowercase after a yes if multiple values are
sentence in which the (first) colon within a referenced in one
value is referenced sentence, sentence, only annotate
the relevant part of the
sentence under
Rule/Condition.

in this example, two values


were referenced in one
sentence, one in each
clause, which is why we
need to annnotate each
part on a separate row. List
the first part of the
sentence under
Rules/Conditions to match
the value of the same row.
We can only list one value
reference under one
Rule/Condition.

In this example, the next


row would have the other
half of the sentence under
Rule/Condition: "except if
the colon introduces a
direct quotation or if the
first word after the colon is
a proper noun.”
*Example - SG if available in the SG, add all n/a no 1. make sure this example is
applicable examples for this illustrating this rule in
particular value/rule particular.
2. examples can be
presented as a chart, as a
sentence, etc. Extract and
annotate them the same
way.
*Example1 - Gen after exporting all available yes Example1 can be generic,
examples from the SG, domain-agnostic
create two examples of
your own for each rule.
*Example2 - Gen after exporting all available yes Example2 must be in same
examples from the SG, domain as the SG; a wider
create two examples of domain is acceptable.
your own for each rule. E.g., if annotating a SG for
Microsoft, create an
example in a tech domain;
not limited to Microsoft
product
example structure: Source Source: N/A yes "Source" will not be
Correct target Correct target: Note: available in monolingual
Incorrect target this is the last recall style guides.
attempt.
Incorrect target: Note: See below notes on
This is the last recall examples for charts and
attempt. lists
proposed_attribute if you believe an attribute is
missing from our list, please
do the following:
1, use this field to add it.
2. use this proposed
attribute in your usual
annotation as you would
with an attribute from our
list
3. reach out to us to discuss
right away. This is very
important: please
communicate the proposed
change before proceeding.
proposed_value if you believe a value is
missing from our list, please
do the following:
1, use this field to add it.
2. use this proposed value
in your usual annotation as
you would with a value
from our list
3. reach out to us to discuss
right away. This is very
important: please
communicate the proposed
change before proceeding.
How to read the attributes/value list

The list contains the attributes and values of interest to us.


 the Attributes are grouped under 3 Categories: Voice, Linguistic Conventions, and Localizing Formats.

 It is possible that an Attribute from one category may serve an additional purpose of another category in the actual SG; in
this case, please still do map it to the “original” category from our list.
o e.g., “punctuation” is listed under Linguistic Conventions, but you may notice that in a Style Guide it has an implied
message for Voice: “Never use an exclamation point when translating comments directed at customers; we do not
want to sound as if we were yelling.” ← still, this is a rule about Punctuation and belongs to Linguistic Conventions.

 some Values will appear in the form of a chart, list or glossary. Usually, such lists provide exact instructions on words to use
or words to avoid. In such cases,
o there will be no Value to annotate
o place all indicated words/terminology under separate examples
o these examples may have either “correct” or “incorrect” usage available:
 e.g. there can be “use this/not this” chart, in which case you will have both correct and incorrect examples.
 Or, you may only see “preferred word choices” or “approved terminology” lists, in which case you will
annotate: Source - N/A, Correct target: word to use; Incorrect target: N/A.

 Linguistic Conventions: *linguistic conventions that should only be annotated if they deviate from the standard language
conventions and are specific to this particular SG or client. This note is added to the attributes that may be standard across
the entire locale, but sometimes may be used to express Client’s voice/tone or specific style.
o you should only annotate such Attributes if they serve the Client’s style.
o this will be your judgement; you will rely on your expert knowledge of the language to identify “standard” linguistic
instructions from a deviation. E.g.,
 “Capitalize proper nouns” - standard convention in English; do not annotate
 “Capitalize proper nouns related to our product but lowercase all models and makes of a direct competitor” -
this is specific to a Client and deviates from the standard convention, which is to capitalize all proper nouns.
Annotate this attribute and rule.
 Localizing Formats: this section is for localizing content. Note that most of the Attributed have only 2 values: localize or
preserve. Once you identify which of the two is required, all other aspects of these attributes (such as spacing, placement of
the currency symbol, truncating the name of the month, etc.) will be annotated as rule/condition under that value.

You might also like