You are on page 1of 34

Transcription Guidelines 3.3.1.

1
(For TX_Meeting HitApps)

July 25, 2019


Last updated: 7/29/2019 1:30 PM

Contents
Transcription Guidelines 3.3.1.1 .................................................................................................................... 1
July 25, 2019 .................................................................................................................................................. 1
1. Summary ................................................................................................................................................ 3
2. General Guidelines ................................................................................................................................ 3
Numbers .................................................................................................................................................... 3
Special Characters ...................................................................................................................................... 3
Casing ......................................................................................................................................................... 3
Named Entities........................................................................................................................................... 4
Punctuation................................................................................................................................................ 5
Math symbols ............................................................................................................................................ 6
Abbreviations ............................................................................................................................................. 6
Restarts/Repetitions .................................................................................................................................. 6
Ungrammatical words ............................................................................................................................... 7
Ambiguity ................................................................................................................................................... 7
Mispronunciations ..................................................................................................................................... 7
Informal Words .......................................................................................................................................... 8
Foreign Words .........................................................................................................................................10
Spelling Reform ........................................................................................................................................11
Cortana Pronunciation (This section is for Cortana “Windows/Xbox/… Assistance” data only) ............11
In-box Assistance .....................................................................................................................................13
3. Language-specific Guidelines...............................................................................................................14
English ......................................................................................................................................................14
O vs Oh vs Zero ....................................................................................................................................14
Zee vs. Zed ...........................................................................................................................................15
K vs. OK vs. Okay ..................................................................................................................................15
Faithfulness ..........................................................................................................................................15
En-US vs. en-GB spelling ......................................................................................................................15
Accent or diacritics in English: .............................................................................................................15

1
Arabic .......................................................................................................................................................15
General.................................................................................................................................................15
Ar-BH ....................................................................................................................................................15
Often Mispronounced Letters .............................................................................................................15
Often Confusing Letters in Written Arabic ..........................................................................................16
Incorrectly Omitted Letters .................................................................................................................16
Attaching words, pronouns, and prepositions ....................................................................................17
Chinese.....................................................................................................................................................17
Numbers ..............................................................................................................................................17
Special Characters ................................................................................................................................17
Spacing .................................................................................................................................................17
English Brandnames.............................................................................................................................18
Pinying in an utterance ........................................................................................................................18
Actual word as fill ................................................................................................................................18
Cantonese (zh-HK) ...................................................................................................................................18
Traditional vs. Simplified Chinese ........................................................................................................18
English or Mandarin in audio ...............................................................................................................18
French ......................................................................................................................................................18
Hyphens ...............................................................................................................................................18
Ouais vs Oui .........................................................................................................................................18
German ....................................................................................................................................................18
Numbers ..............................................................................................................................................18
Faithfulness ..........................................................................................................................................18
“ß” vs. “ss” ...........................................................................................................................................19
Hindi .........................................................................................................................................................19
Japanese...................................................................................................................................................19
Numbers ..............................................................................................................................................19
Kanji vs. Hiragana .................................................................................................................................20
Katakana vs. English .............................................................................................................................21
Transcribe as it is spoken .....................................................................................................................21
The lengthening line “ー” (stretch mark) ............................................................................................21
Korean: .....................................................................................................................................................21
Russian: ....................................................................................................................................................22
4. Tags ......................................................................................................................................................22
Format......................................................................................................................................................22

2
Spacing (high-level summary; for details, see each tag) .........................................................................23
Transcription Modes ................................................................................................................................23
Active Tags ...............................................................................................................................................24
(for details of transcription of “cortana” see section “Cortana Pronunciation” .....................................27
5. Exceptions ............................................................................................................................................27
6. Guideline Codes ...................................................................................................................................27

1. Summary
This document includes guidelines from the science and development teams to the transcription team,
as to how transcription is to be performed.

2. General Guidelines

Numbers
[0101] All numbers will be spelled out. E.g. “one” “two” “three”.

Special Characters
[0201] Special characters, sometimes used for short-hand notation, should not be used. The
following are examples of bad transcriptions:

• “coffee & tea”


• “i will come @ five”
• “you are #1”

Casing
[0301] Words and names must be transcribed in lowercase at all times. Note that this also
applies to the personal pronoun “I”, which should be “i”. This also applies to brand names. .
Examples:

• “i need directions to montreal”


• “maria goes to the mall in seattle”
• “do you know google”

[0302] Only acronyms and spelled-out letters should be capitalized. When a letter is spelled-
out, it must be capitalized, followed by a period, then by a space. Example:

• “directions to IBM” → “directions to I. B. M.”


• “CDs” → “C. D. s”

3
[0304] But acronyms that were created as initialisms, and are conventionally written in all caps, but are
pronounced like regular words. These should be transcribed like regular words; some examples are:

• “NAFTA” → “nafta”
• “NATO” → “nato”
• “MCAT” → “mcat”

Named Entities
[0701] Names of people, companies, songs, apps, etc. should all be transcribed in accordance
with previous rules (e.g., use lower case). Copyrighted terms (e.g., trademarks, brand names,
registered names) are also to be treated the same way.

[0702] Do not use special characters that are not typically part of a word. Examples:

• “Yahoo!” → “yahoo”
• “Ke$ha” → “kesha”
• “P!nk” → “pink”

In many cases, the correct spelling of named entities may be unclear. The following three items
are to help determine correct spelling, and are provided in order of priority. For example, if (1)
applies, then (2), (3),and (4) can be ignored.

[0703] Spelling (1) -- If a name, song title, or app title is present either in the client recognition
box (see In-box Assistance section) or server recognition box, then use that spelling if that’s what
was said in the audio. Only spelling information should be borrowed from the client recognition
box, however , but if the recognition box shows names in capitalized letters, for example, you’ll
need to use lowercase in transcription. Examples:

• Client recognition box contains “Jon Wayne” → transcribe as “jon wayne”


(ignoring that the famous actor’s name is John Wayne).

[0704] Spelling (2) -- Well-known people, companies, songs, apps, etc. should all be transcribed
with the trademarks or official spellings that they are known by (if it has special characters that
are not typically part of a word, see [0702] above; if it contains Arabic numbers, follow [0101]).
If you are unsure how to accurately spell one of these named entities, then perform a web search
to identify the correct spelling.

This applies to named entities that are intentionally misspelled, as well as initials in names. They
should be spelt in accordance with expected spellings. Examples:

• “Inglourious Basterds” -> “inglorious basterds”


• “Boyz n the Hood” → “boyz n the hood”
• “Pet Sematary” → “pet sematary”
• “Salt-n-Pepa” → “salt n pepa”
• “O Canada” → “O. canada”
• “Ice-T” → “ice T.”

4
For proper names (trademarks) with hyphen, please see [0606] below.

[0705] Spelling (3) -- If a non-well-known person’s name can be taken as initialisms, transcribe
them as initials if there is no In-box Assistance. Example:
• “T J is here” → “T. J. is here”

[0706] Spelling (4) -- If the correct spelling is still not clear, then make your best judgement based
on the context that you have.

Punctuation

Explicit Punctuation
[0601] If a speaker is dictating and says a punctuation mark's name, such as "exclamation point", use
standard punctuation format from Appendix A (in the future, we will enable HITApp right click insertion
from dropdown menu). If what the speaker said is not in the list provided Appendix A, transcribe it in
the following format: “SYMBOL\NAME”: the symbol itself, a backslash, and the name of the punctuation
mark transcribed in capital letters. If the punctuation mark's name is more than one word, such as
"exclamation point" or "question mark", use an underscore instead of a space to separate the
capitalized words.

The transcription should reflect exactly what was said, and there should be space at the
beginning and end of explicit punctuation. Examples:

• !\EXCLAMATION_MARK
• !\EXCLAMATION_POINT
• .\PERIOD
• “go !\EXCLAMATION_MARK james said”

But in the following situation, use words not punctuation/sign:


• “five point six” → five point six
• “go to amazon dot com” → go to amazon dot com
• “@microsoft.com” → “at Microsoft dot com”
• “#” → “hashtag”

Apostrophes
[0603] Apostrophes should be included as part of the transcription. If the apostrophe is
following famous person’s name which is an initial, add space then apostrophe. Examples are
as the following:
• can’t
• he’s
• john’s
• mary jones's husband (singular possessive)
• the joneses' new car (plural possessive)
• jesus' (singular possessive that is exception to the rule)
• jay Z. ’s
[0607] “Mind your P’s and Q’s” → “P. ’s and Q. ’s

5
Hyphens
[0604] Special attention should be paid to correct transcription of hyphens (-) for words where
the hyphen provides semantic value (such as for French numbers). In such cases, hyphens
must be included in the transcription.

• For example, quatre-vingt-neuf is the correct way of transcribing “eighty nine” in French. If this
was transcribed without hyphens (i.e., quatre vingt neuf), it could be interpreted as “four twenty
nine”.

[0605] Written-together forms like “openfaced”, which are not recognized by the MS-Word speller,
should not be used. In this case, a hyphen should be used. Examples:

• an open-faced sandwich
• a drive-by shooting
• a shut-down computer (but “a computer shutdown”)

[0606] Proper names (trademarks) with hyphen shall be transcribed with hyphens, e.g.
• “Chick-fil-a”→ “chick-fil-a”
• “Coca-Cola” → “coca-cola”

Math symbols
[1801] Do not use math symbols. If a person says “6 divided by 3 equals 2”→ “six divided by three
equals two”.

Abbreviations
[1901] Abbreviations, including common metric abbreviations must be transcribed as spoken: e.g. for
en-US:
• kilometers, centimeter, etc., not km, kms, cm, cms etc.
• “mister” instead of “mr.” or “M. R.”, etc.

[1902] But “missus” is an exception. Transcribe “missus” → “mrs” (note: no period is allowed
here; “mrs.” is wrong!).

Restarts/Repetitions
[0901] Restarts/repetitions need not be tagged if the words are not damaged; simply transcribe
the words as uttered by the speaker. Example:

• “i i just wanted to say” (since ‘i’ is undamaged and complete)


(For a false start, partially pronounced word or stumbled over speech in the utterance, see <FILL/> in Tag
Section).
(For a word truncated by audio, see <UNKNOWN/> in Tag Section. Note: a truncated word is defined as a
word with a portion of it cut off from the beginning or end it by the audio)

6
Ungrammatical words
[1001] For words which are ungrammatical in the given context, but are clearly articulated,
transcribe them as spoken. For example, if a speaker utters the plural form “bonds” in the
sample sentence below, transcribe it exactly as “bonds”.

• E.g. “find a bonds with a ten year maturity date”

Ambiguity
Language, by its very nature, can be very ambiguous – especially for very short utterances.
Ambiguity can come in many forms:

• Homophones (words with same sound but different spelling)


• E.g. clothes/close, no/know
• Variant spellings (words with more than one acceptable spelling)
• E.g., ax/axe, donut/doughnut, barbecue/barbeque
• Inflections
• E.g., in French many verb conjugation forms have the same
pronunciation, such as achète/achètes/achètent
• Proper Nouns
• E.g., John/Jon, Catherine/Katherine, Main St vs. Maine St

[1701] For homophones, variant spellings, and inflections, simply choose the form that you
feel is most likely what the speaker intended. Make sure you factor in whatever context you
have in making these decisions.

• Audio that only contains a single word, either “no” or “know” -> “no” (because this is more
likely what was intended)
• Audio contains “call John” or “call Jon” -> “call John” (if that’s the form that you feel is
more common when there is no “in-box- assistant”-Guideline [1302])
• If dictionaries offer variants for the word (e.g. “T-shirt”, “tee shirt” , “tee-shirt”, etc.), use
the default spelling (it is generally the head entry in dictionaries) and transcribe
according to guideline:
• E.g., “T-shirt” → “T. shirt”

NOTE: To avoid unfairly penalizing transcribers in this area, we will try to avoid highly-
ambiguous examples in our Gold utterances and/or develop relaxed scoring rules. If you do
encounter ambiguous Golds, please alert the Microsoft team.

For proper nouns, see the section above on Named Entities – Spelling.

Mispronunciations
A mispronounced word is defined as a word that we know it is what the speaker intended to say, but
certain sound(s) in the word may be said incorrectly, dropped, or have their order switches around.

7
[1501] For mispronunciations where the intent is clear, the standard word should be used. If
the intent is not clear, use <UNKNOWN/> This include cases like metathesis (transposition of
sounds like “misocroft” for “Microsoft”) and dialectal pronunciation which is not a word or a
word of the same meaning in the dictionary. Examples:

• “californa” (doesn’t pronounce the ‘i’) → “california”


• “misocroft” (if in context, we are sure what the speaker meant) → “Microsoft”
• In certain dialect in Italian, the word “mangia” meaning “eat” is pronounces as
“mancia”. Even though “mancia” is a word in Italian but it does not mean “eat”:
“mancia” → “mangia”

[1502] For mispronunciations where the intent is not clear, the <UNKNOWN/> tag should be
used. [see <UNKNOWN/> section in Tags]

• “directions to al-“ → “directions to <UNKNOWN/>”.

[1504] If a word is only partially pronounced because the audio is cut off (either beginning or
end of audio), then the <UNKNOWN/> tag should be used. [see <UNKNOWN/> section in Tags]

Informal Words

[1601] If a speaker clearly uses an informal pronunciation that is formally recognized as a word
then use the informal word. Examples of informal words in en-US that are recognized as words
include: wanna, gonna, kinda, sorta, betcha, doc, etc. if in doubt or just in fast speech, use the
formal word.

• “you betcha” → “you betcha” since “betcha” is in the en-US dictionaries.

[1602] Do not create new words (e.g., fulla for full of, lika for like a). If in doubt, the following
dictionaries can be used to determine if a word is officially recognized or not:

Locale Dictionary Name Dictionary URL

ar-EG https://www.almaany.com/

ca-ES Institut d'Estudis Catalans http://mdlc.iec.cat


Diccionari de la llengua catalana de
https://dlc.iec.cat
I'Institut d'Estudis Catalans
da-DK https://ordnet.dk/ddo

ordbogen.com

www.denstoredanske.dk

de-DE Duden http://www.duden.de/

en-AU Oxford Dictionary (Australian English) https://en.oxforddictionaries.com/

Macquarie Dictionary https://www.macquariedictionary.com.au/

en-CA Oxford Dictionary (Canadian English) https://en.oxforddictionaries.com/

8
en-IN Oxford Dictionary (Indian English) https://en.oxforddictionaries.com/

en-GB Oxford Dictionary (British English) https://en.oxforddictionaries.com/

Cambridge Dictionary (British English) http://dictionary.cambridge.org/dictionary/essential-british-english/


https://www.oxfordreference.com/view/10.1093/acref/9780195584516.001.00
en-NZ The New Zealand Oxford Dictionary
01/acref-9780195584516
en-US Merriam-Webster Dictionary https://www.merriam-webster.com/

Dictionary.com http://www.dictionary.com/

Oxford Dictionary (US English) https://en.oxforddictionaries.com/

Cambridge Dictionary (American English) http://dictionary.cambridge.org/dictionary/essential-american-english/

es-ES Real Academia Española http://dle.rae.es/?w=diccionario

Lleva Tilde http://llevatilde.es/

es-MX Colmex http://dem.colmex.mx/

fi-FI Kielitoimiston sanakirja https://www.kielitoimistonsanakirja.fi

fr-CA Office québécois de la langue française https://www.oqlf.gouv.qc.ca/accueil.aspx


Bescherelle L'art de conjuguer http://bescherelle.com/

Petit Larousse http://www.larousse.fr/

fr-FR Petit Larousse http://www.larousse.fr/

hi-IN Shabdkosh www.shabdkosh.com

https://hi.oxforddictionaries.com/

https://www.collinsdictionary.com/dictionary/english-hindi

https://www.lexilogos.com/english/hindi_dictionary.htm

Oxford Dictionary (Hindi) https://hi.oxforddictionaries.com/

it-IT Treccani http://www.treccani.it/

Zanichelli https://www.zanichelli.it/

Accademia Della Cursca http://www.accademiadellacrusca.it/it/pagina-d-entrata

ja-JP Sanseido's Japanese Dictionary http://www.weblio.jp/

Goo https://dictionary.goo.ne.jp/

ko-KR http://stdweb2.korean.go.kr/main.jsp

https://ko.dict.naver.com/#/main

https://dic.daum.net/index.do?dic=all&q
nb-NO https://ordbok.uib.no

nl-NL Van Dale https://www.vandale.nl/

Woordenlijst Nederlandse Taal https://woordenlijst.org/

pl-PL Słownik języka polskiego https://sjp.pwn.pl/

Wielki słownik języka polskiego https://wsjp.pl/

pt-BR Priberam https://www.priberam.pt/dlpo/

Michaelis www.michaelis.uol.com.br

pt-PT Dicionário da Língua Portuguesa https://www.portoeditora.pt/app-dlp

sv-SE Saol (So) https://svenska.se/


Soaob https://www.saob.se

http://folkets-lexikon.csc.kth.se/folkets/#lookup&Information

9
Современный толковый словарь
ru-RU https://dic.academic.ru/contents.nsf/efremova/
русского языка Ефремовой
th-TH Thai Royal Society Dictionary http://www.royin.go.th/dictionary/

Longdo Dict https://dict.longdo.com/

zh-CN 新华字典 http://zd.diyifanwen.com/

新华字典 http://zidian.cibiao.com/

在线新华词典 (Online Xinhua Dictionary) http://xh.5156edu.com/


百度词典 (Baidu Dictionary)
http://dict.baidu.com/

百度输入法 (Baidu Input) https://shurufa.baidu.com/?pz-srf-bt

zh-HK http://kaifangcidian.com/han/yue/%E4%BF%82

http://www.cantonese.sheik.co.uk/scripts/masterlist.htm

https://humanum.arts.cuhk.edu.hk/Lexis/Canton2/

zh-TW 教育部國語字典 http://dict.revised.moe.edu.tw/cbdic

[1604] If the speaker explicitly says "ha ha ha" (which is different from legitimate laughing, which the
guidelines consider as noise), "bowwow", or "bang", etc. transcribe them if, and only if, the dictionary
considers it is a word.

Foreign Words

Borrowed Words
[1101] If the foreign word/language is part of the language’s regular lexicon (i.e., would be
understood by most speakers), write it in the foreign language using foreign language script.

For how to transcribe English words in Hindi, please see the “Language Specific Guidelines for
hi-IN.

[1104] The English word “Okay” is one of most used borrowed word in other non-English
languages. The transcription of it should be “O. K.” when it is used as borrowed word in other
languages following Guideline [en0301].

Partial-utterance Foreign Words


[1102] If only part of the utterance is in a foreign language, then mark that part of the
utterance as <UNKNOWN/> but transcribe the remainder, but if the speaker is searching for a
foreign celebrity name (like a singer), or a song title, or game title, transcribe it, do not use
<UNKNOWN/>; conduct a side-search if necessary.

Full-utterance Foreign Words

10
[1103] If the entire utterance is in a foreign language, use the <UNKNOWN/> tag to represent
the entire utterance.

Spelling Reform
[0501] Use post-Reform spelling rules for languages with spelling reform (German, French, Portuguese),
e.g. German: write “dass” rather than “daß”.

Cortana Pronunciation (This section is for Cortana “Windows/Xbox/… Assistance”


data only)
Note: we no longer have prescribed spelling for mispronunciations of Cortana. Please follow
the new guidelines below for transcription of the word.

The word ‘cortana’ requires special attention and should be treated differently than other
words. This applies to all languages (including ja-JP “コルタナ”).

[1401] If pronunciation of ‘cortana’ is correct, then it should be treated the same as other words – i.e., it
should be transcribed as ‘cortana’. The table below contains some sample acceptable ‘cortana’
pronunciations:

Note the following:


1. All borderline pronunciations will be considered correct (i.e., we just want to identify outliers)
2. Multiple, correct pronunciations may exist for some locales (i.e., due to regional
accents). (Some audio samples are below-only playable in the Word version of the guideline):

Locale Sample 1 Sample 2 Sample 3 Sample 4


De-DE

De-DE Hey Cortana De-DE Hey Cortana


Example 1.m4a Example 2.m4a
En-AU

en-AU_cortana.m4a en-AU_cortanna.m4a
En-CA

en-CA Hey Cortana


en-CA Hey Cortana en-CA Hey Cortana Example 3.m4a
Example 1.m4a Example 2.m4a
En-GB

en-GB_cortana.m4a en-GB_cortanna.m4a
En-IN

en-IN_corta'na.m4a en-IN_cor'tana.m4a

11
En-US

en-US Hey Cortana


en-US Hey Cortana en-US Hey Cortana Example 3.m4a
Example 1.m4a Example 2.m4a
Es-ES

es-ES_Cortana_exames-ES_Cortana_exames-ES_Cortana_exam
Es-MX ple2_DABE5AEE0631483080706B550B437A59.m4a
ple3_303C0A558559434F9FFBCB04B27377EF.m4a
ple1_E0E744596D044EB7A7BFA743798D45A5.m4a

es-MX_Cortana_exam
es-MX_Cortana_exam es-MX_Cortana_exam
Fr-CA ple2_A2C9231BE6D04C4DA1F975F6399E070F.m4a
ple3_A9CFA397392A4AE682C3FA6867C1BB9A.m4a
ple1_4D61D505D7C64529BE3BFC7917B3E72B.m4a

fr-CA_Cortana_examp
fr-CA_Cortana_examp fr-CA_Cortana_examp
fr-CA_Cortana_examp
le1_B49E6700FFE2421BA6BE5689D1D0F3D8.m4a
le3_F8B3FB2055844951BF114319C28A1442.m4a
le1_033431654B9E45C4BCF441DCE0DA5720.m4a le4_B6B3D300181948FB8D96C79AEA2596A
Fr-FR

fr-FR_Cortana_exampl
fr-FR_Cortana_exampl fr-FR_Cortana_exampl
It-IT e2_0BE11A08A757443C9487C3FA1F1EE54B.m4a
e3_B26DE4FDABA745828FDAE0E66B7671C9.m4a
e1_827AAAF6D96142E28268D689A7B46CA5.m4a

it-IT_Cortana_examplit-IT_Cortana_examplit-IT_Cortana_examplit-IT_Cortana_exampl
Ja-JP e2_D6E1F59B869747DA9ABC20794D7F9117.m4a
e1_95C04571EE57406D926503BB1BEF8D60.m4ae3_153DEF254BB044299078D41D503D0B56.m4a
e4_05A20D529FEB4D8DA7323DE2C3CF4A1

ja-JP Kortana-san ja-JP Kortana-san


Pt-BR (Hey
(Hey Cortana) Example Cortana) Example 2.m4a
1.m4a

pt-BR_coqtana.m4a pt-BR_cortana.m4a pt-BR_coxtana.m4a


Zh-CN

zh-CN_2444.m4a zh-CN_3334.m4a

[1402] If pronunciation of ‘cortana’ is incorrect (e.g., ‘cortina’, ‘cortona’), then it should be transcribed
as ‘cortana’ (since that was the intent), but a <MP/> tag should be added after the word ‘cortana’ to
indicate that it was mispronounced (e.g., “hey cortana<MP/> tell me a joke”). MP = MisPronounced.
This also applies to partially pronounced “cortana”, where speakers did not pronounce the full word,
and if we can clearly tell “cortana” is the intended word.

• ‘hey cortan-‘ (where the final ‘a’ is not spoken by the speaker) → ‘hey
cortana<MP/>’

If we cannot be sure “cortana” is the intended word, following [1405] below.

12
[1403] Use of <MP/> tag should only occur with mispronunciations of ‘cortana’. It should not be used
for any other word, even for words that are related to keywords (e.g., ‘hey’, ‘select’, ‘hola’ (es-ES), ‘san’
(ja-JP)).

[1404] If another word is spoken instead of ‘cortana’, then write that word instead. E.g., “hey siri”, “hey
google”, or “hey there”.

[1405] Like other words, if the audio is cutoff during ‘cortana’, it should be transcribed as
<UNKNOWN/>.

For reference, here is the list of current keywords.

Locale Keyphrase
en-US Select
en-US Hey Cortana
en-GB Hey Cortana
fr-FR Hey Cortana
it-IT Ehi Cortana
de-DE Hey Cortana
es-ES Hola Cortana
zh-CN 你好小娜
(Nǐ hǎo xiǎo nà)
ja-JP コルタナさん
(Korutana-san)
en-IN Hey Cortana
en-CA Hey Cortana
en-AU Hey Cortana
pt-BR Ei Cortana
fr-CA Hé Cortana
es-MX Hola Cortana
hi-IN हे कोर्टा नट
(Hey Cortana)

In-box Assistance

[1301] If available, you will be provided one or two text boxes containing text (see sample
screenshot below). Both versions of text are output from Microsoft speech recognizers. As a
result, it may not be reliable information. This information is provided to transcribers,
however, as an assistance. Since the provided text is not reliable, it is extremely important
that transcribers do not simply accept this text as the final transcription. The final text
submitted MUST match the audio as closely as possible. Please don’t be biased by this

13
information. Generally speaking, the top text box tends to be more accurate. However, the
second text box has access to a person’s personal contact list, list of apps installed on their
devices, and list of songs in their library. (Note that the first box also sometimes reflects a
speaker’s contact list, list of apps., etc. – see [0703] Spelling (1) above for guidance when
either box displays proper names.)

[1302] If different spellings of a name, app, or song title are provided, then use the spelling
provided in the second box. For example, if the audio contains “call john”, the first text box
contains “call john”, but the second box contains “call jon”, then the correct transcription will
contain “jon” (because this is how the name was spelled in the user’s contact list). Same for
app names and song titles.

[1303] If what you hear is different than what you see in the text provided, then write what
you hear. Remember, the provided text is only provided as assistance, and is often not
correct.

3. Language-specific Guidelines
[2001] Russian as well as some Romance languages have grammatical categories of gender, conjugation,
case, declention. Sometimes when the speech is out of context, it is difficult to determine which specific
category a word belongs. In this case, use transcriber’s native speaker intuition and discretion.

English

O vs Oh vs Zero
• [en0101] The letter ‘o’ should be transcribed as ‘O.’ (see also “O. candana” in
Guideline [0704].
• [en0102] If used as an exclamation, ‘oh’ should be transcribed as ‘ohh’.
• [en0103] The number zero when spoken as ‘o’ should be transcribed as ‘oh’.

14
Zee vs. Zed
• [en0201] Both pronunciations should be transcribed as ‘Z.’.

K vs. OK vs. Okay


• [en0301] Full pronunciation should be transcribed as ‘O. K.’, not ‘okay’.
• [en0302] The abbreviated, less formal pronunciation should be transcribed as ‘K.’
if clearly spoken that way.
• [en0303] Determining if it’s K or OK can be difficult, particularly in fast speech, so
in borderline cases (where pronunciation is not clear), transcribe in the formal
version (i.e., ‘O. K.’).

Faithfulness
• [en0401] Use yup and yeah if this is what the speaker says. [The transcription yea
(rhymes with day) is only used for the exclamation meaning woohoo]. Note: Do
not use shortened forms like gettin' or gettin to transcribe -ing words like getting

En-US vs. en-GB spelling
• [en0501] For en-US words, do not use British spellings, such as favourite and
colour, unless spelled that way as part of a proper noun (space shuttle Endeavour,
song Colour my World)

Accent or diacritics in English:


• [en0601] Do not use accent or diacritics in English unless it is trade-marked
or recognized by spell-checker, e.g.
o "Slavoj Žižek" -> "slavoj zizek"
o “soufflé” → “soufflé”

Arabic

General
• [ar0101] Numbers are to be spelled out in letters.
• [ar0102] It is always required to spell accurately specially for the letters “ ‫ث ذ ظ‬
‫ ”ط ض‬which are often transcribed incorrectly.
• [ar0103] We can be a bit lenient with the letters “ ‫ ”أ ا إ آ‬which is often confusing (but
the gold HITs should not include them as they are often not transcribed consistently).
• [ar0104] Diacritics are not needed to be written down.
Ar-BH
• [ar0201] For ar-BH there is a standard spelling that must be adhered to, even when the
speaker is influenced by his/her local dialect and pronounces incorrectly.
Often Mispronounced Letters
• [ar0301] The following letter are often mis-pronounced but should be transcribed
according to the proper Fusha transcription.

15
Letter Example Pronounced as Transcribed as
‫ق‬ ‫قبل‬ ‫قبل‬ ‫قبل‬
‫أبل‬
‫غبل‬
‫جبل‬
‫ذ‬ ‫ذرة‬ ‫ذرة‬ ‫ذرة‬
‫زرة‬
‫هـ‬ ‫كله‬ ‫كله‬ ‫كله‬
‫كلو‬
‫ض‬ ‫رضب‬ ‫رضب‬ ‫رضب‬
‫ظرب‬
‫ل‬ ‫إسماعيل‬ ‫إسماعيل‬ ‫إسماعيل‬
‫ر‬
‫إسماعي‬
‫ث‬ ‫ثم‬ ‫ثم‬ ‫ثم‬
‫سم‬
Often Confusing Letters in Written Arabic
• [ar0401] Follow the guidance below to decide confusing letters in written Arabic
Letter Confused How to decide Example
with
‫ة‬ ‫هـ‬ Try to pronounce the word ‫ حضارة‬/ ‫حضه‬‫ر‬
in the middle of a sentence Try the
if a “t” sound is produced sentences
use “‫ ”ة‬if an “h” sound is ‫"الحفل الذي‬
produced use “‫”هـ‬ "‫حضه المشاهي‬ ‫ر‬
“ ‫كانت حضارة‬
‫”القدماء عظيمة‬
‫ي‬ ‫ى‬ If the word has a soft “a” “ ‫ذلك شأن القوى‬
sound at the end use “‫”ى‬ ‫”العظىم‬
but if the sound is “i/y” “‫”هو القوي‬
sound use “‫”ي‬
‫ا‬ ‫أ‬ Use “‫ ”ا‬if the letter becomes ‫الناس‬
‫إ‬ silent in the middle of a ‫أنا‬
‫آ‬ sentence or if it is a long ‫إيمان‬
vowel (‫)حرف مد أو همزة وصل‬. ‫آمن‬
Use “‫ ”أ‬if a glottal stop
sound if produced in the
middle of sentence ‫(همزة‬
)‫قطع‬.
Use “‫ ”إ‬if the glottal stop
sound is followed by an “i”
vowel.
Use “‫ ”آ‬for a long “a” vowel
after the glottal stop
‫ء‬ ‫ئ‬ The general rule here is to ‫سماء‬/‫شء‬
‫ي‬
‫ؤ‬ choose the form that is ‫رئيس‬
‫أ‬ compatible with ‫مؤسسة‬
surrounding context ‫قرأ‬

Incorrectly Omitted Letters


• [ar0501] Some letters become silent in some contexts, but they should still be
transcribed

16
Letter Silent if Example
‫ا‬ Third person plural verb marker ‫شبوا‬
‫ا‬ Nunation marker ‫جميال‬
‫ا‬ As in the above table (‫)همزة وصل‬ ‫باحب‬
‫ل‬ Definite article followed by one of the following ‫الشمس‬
letters: { ،‫ ظ‬،‫ ط‬،‫ ض‬،‫ ص‬،‫ ش‬،‫ س‬،‫ ز‬،‫ ر‬،‫ ذ‬،‫ د‬،‫ ث‬،‫ت‬
‫ ل‬،‫} ن‬

Attaching words, pronouns, and prepositions


• [ar0601] Always attach
o Single letter prepositional prefixes such as: {‫ ل‬,‫ ف‬,‫ و‬, ‫ ب‬,‫}ك‬
o Always attach object pronouns starting with {‫ ك‬, ‫}ـه‬
• [ar0501] Never attach
o { ‫ ع‬, ‫ عىل‬, ‫ ها‬, ‫}هذا‬
o Object pronouns starting with {‫ }ل‬like {... ‫ يل‬, ‫ لهم‬, ‫ لكم‬, ‫ لها‬, ‫ لنا‬, ‫}له‬
o Negation words { ‫ ما‬, ‫}ال‬
o ‫عىل‬‫ أبو ي‬, ‫ عبد هللا‬, ‫ عبد الرحمن‬, ‫ إن شاء هللا‬, ‫الحمد هلل‬
o Numbers and words (e.g. ‫)خمسة أيام‬

Chinese

Numbers
• [zh0101] Numbers should be transcribed with Chinese characters not Arabic numerals.
• [zh0102] Transcribe both pronunciations “yi’ and “yao” for Chinese digit 1 as 一
• [zh0103] 7(guai) –> 七 and 0(dong) -> 零

Special Characters
• [zh0201] When someone says “plus” (“加”), it does not mean that it needs to be
entered as a plus sign (“+\加”). Unless the speaker specifies that it should be the plus
sign (e.g. by using the word 号“sign” or “punctuation”), it should be spelled it out as +\
加号.
• [zh0202] Regarding retroflex articulations, transcriptions should include “儿” if it was
articulated. e.g., 打开照片 vs 打开照片儿

Spacing
• [zh0301] Generally speaking, there should be no spaces between characters.
• [zh0302] Do not insert spaces when a speaker corrects himself. E.g., 打开打打开音

• [zh0303] Spaces should be present before/after English words
• [zh0304] Spaces should be present before/after tags, if appropriate (see guidelines
on tags).

17
English Brandnames
• [zh0401] Transcribe brand names in English when that is how the speaker pronounced
it. Transcribe a foreign word with Chinese characters if the speaker says it in the Chinese
way (e.g., ‘pizza’ vs ‘披萨’).
Pinying in an utterance
• [zh0402] In case a person say Pingying first then the word, transcribe the word only.
Example:
• “w ang 王” →”王”.

Actual word as fill


• [zh0402] If the fill in Chinese is an actual word for it such as “嗯、啊、唉”, the character
should be used instead of <FILL/>. E.g., “嗯我们就这样把”

Cantonese (zh-HK)
Traditional vs. Simplified Chinese
• [zh-HK0101] Use traditional Chinese for transcribing Cantonese.
• [zh-HK0102] When both traditional and simplified variations are acceptable, the
traditional version should be used. e.g., Taiwan should be 臺灣 not 台灣.
English or Mandarin in audio
• [zh-HK0201] For English in zh-HK data, if you understand the English, transcribe it; if you
do not understand the English, use <UNKNOWN/>.
• [zh-HK0202] For Mandarin (国语/普通话) in zh-HK data, use <UNKNOWN/>.

French
Hyphens
• [fr0101] We are standardizing on NO hyphens for cases where the number is “cent”,
“mille” or when connected with “et”. The reason is because that’s the most common
and preferred spelling in France nowadays. Examples:
o huit cent cinquante et un
o deux mille quatorze

Ouais vs Oui
• [fr0201] You may transcribe with ouais rather than oui, if this is what the speaker says.

German
Numbers
• [de0102] If, as is the case in most software names, the number is normally written with
Arabic numerals, then do it just the same as in English.

Faithfulness
• [de0201] If a speaker speaks colloquially and says “is” meaning “ist”, then transcribe as
“ist” (this is an example of Guideline [1501])

18
• [de0202] You may use 1st-person singular forms without the final ‘-e’, if this is what the
user says (ich geh ins kino), and conversely imperative forms with final –e (denke nicht
dran). When in doubt, use the version with the final “-e”.

“ß” vs. “ss”


• [de0203] For correct transcription of words with “ß” or “ss”, look it up in the Duden, or
duden.de.

Hindi
• [hi0101] Most English borrowed words in Hindi are not part of any formal Hindi
dictionary, yet most people speaking Hindi especially in urban areas would use these
words freely while speaking colloquially. Transcribe the English words with English
spelling
• [hi0102] If words from languages other than English is mixed with Hindi follow the
following principles:
o Transcribe in Hindi all known Hindi words and Entities,
o else treat it as foreign (follow guidelines from “Foreign Words” section).
Japanese

Numbers

[ja0201] To differentiate cases like “72 (seven two)” and “72 (seventy-two)”, numbers should
be transcribed with Kanji characters as they are pronounced. In other words, the Kanji number
text would represent the actual pronunciation without any ambiguity.

[Examples]
“seven two” -> “Nana Ni” -> 七二
“seventy two” -> “Nana-jyu Ni” -> 七十二

Special cases of numbers:

1. Zero

[ja0202] Zero could be レイ(rei)、マル(maru)、ゼロ(zero) depending on how it’s pronounced.


“rei” and “zero” are both typically transcribed as 零. Therefore, instead of using Kanji
characters, use Katakanas as below:

[Examples]

“702” could be 七レイ二”, “七マル二” or “七ゼロ二” depending on how it’s pronounced.

2. Decimal point

[ja0203] A decimal point is transcribed as 点.

19
[Examples]
“7.2” is 七点二.

3. Foreign pronunciation

[ja0204] If a number is “described” in a foreign language use katakana (e.g. “Lucky Seven” is “ラ
ッキーセブン”, not “ラッキー 七”).

[Examples]

1. “Windows 10”:
If pronounced “Uindouzu Ten”, “10” should be transcribed in katakana such as “テン”.
If pronounced “Uindouzu Jyu”, then it should be “十”.
2. “Boeing 747”: If pronounced as “Boingu Sebun Fo Sebun”, then it’s “ボーイング セブ
ンフォーセブン”, instead of “ボーイング 七四七”.Kanji vs. Kana

Kanji vs. Hiragana

[ja0301] General rule is to use the Hiragana expression.

[Examples]

1. Hiragana is preferable (i.e., Kanji words are too formal)

<Kanji> <Hiragana>
× 但し ○ただし
×暫くして ○しばらくして
× 改めて 〇あらためて

2. But Kanji is preferable in the following (i.e., Hiragana seems more appropriate for
children)

<Kanji> <Hiragana>
〇閉じる ×とじる
〇皮膚 ×ひふ

3. Both are acceptable

<Kanji> <Hiragana>
〇明日は 〇あすは、あしたは
〇健やか 〇すこやか

20
Katakana vs. English
• Transcription of English words will depend on the pronunciation:
o [ja0101] Use English if the pronunciation sounds like native English.
o [ja0102] Use Katakana if the pronunciation sounds like Japanese (i.e., like a
word that has been borrowed into the Japanese language).
o [ja0103] When it’s not clear, the default should be katakana.

Transcribe as it is spoken
[ja0401] When the speaker says an address and uses “の” for any hyphen between
numbers, the transcriber should write “の” in hiragana and not katakana.
e.g. “文京区本郷 7-3-1” → “文京区本郷七の三の一”

The lengthening line “ー” (stretch mark)


[ja0501] The Japanese lengthening line “ー” should be used to transcribe a word only in two
cases:
a. The official entry of the word in the dictionary contains the lengthening line
e.g.ラーメン、サーモン
b. The word is a proper name which officially contains the lengthening line
e.g. ユーチューブ、ベートーヴェン

If the dictionary entry of the word or the proper name does not officially contain the
lengthening line “ー”, do not use the lengthening line when transcribing the word, even if the
speaker is lengthening it in the audio. Transcriptions should contain only words that are in
official dictionaries, or proper names that can be confirmed on the web.

Korean:
• [ko0101] Word breaking space is required. Correct the spacing if it is wrong.
• [ko0102] Word breaking should separate the sentence into the smallest unit. For
example, both of the following are acceptable, transcription should use b. (the smallest
unit):
a. ‘보여주다’
b. ‘보여 주다’
• [ko0201] if the speaker made a mistake (including foreigners speaking Korean) in
pronouncing the word and the intention is clear, use the standard form. (Reference to
Guideline [1501] in main section.
• [ko0202] if the speaker made a mistake (including foreigners speaking Korean) in
pronouncing the word and the intention is NOT clear, use <UNKNOWN/>. (Reference to
Guideline [1502] in main section.
• [ko0203] An exception to Guideline [ko0201]: Originally 너의 (nuh-ui; your) is
abbreviated to 네 (ne), and it should sound 'ne'. But in recent trend, 네 sounds 니 (ni),

21
which violates the rule, but most of the people use this form. If it sounds 'ni', transcribe
it as 니.
• [ko0204] If the audio is partially spoken or truncated, use <UNKNOWN/>. (Reference to
Guideline [1504] in main section.
• [ko0205] If the speaker used a casual form or dialectal form of a word which can be
found in authoritative dictionaries with the same meaning, use the casual form;
otherwise, use the formal form of the word. (Reference to Guideline [1601] [1602] in
main section.
• [ko0301] If an audio has more than one possible transcription like 일하다 (working) and
일 하다 (doing that stuff), use context to determine which one should be used. If the
context cannot help you determine which one should be used, the transcriber should
pick the one that is more common in life.

Russian:
• [ru0101] Colloquially pronounced words should be transcribed according to the
standard form that appears in the dictionary if the colloquially pronounced words are
not recognized as new words, e.g.
o “чё”, “шо” should be transcribed as “что”
o “щас” should be transcribed as “сейчас”

• [ru0102] Letter “ё” must be used where needed.

• [ru0103] There are App/product names such as Skype and Viber, etc. have been fully
adapted into Russian and have Russian spelling like “Скайп” and “Вайбер”; for cased
like this, use the Russian words unless the pronunciation is clearly English, in which
cases, use the English words.

• [ru0104] Some English words is widely used in Russian and they are common written
form is in Cyrilics. Use the Russian form of the words, e.g. “окей” (Russian version of
“okay”)

• [ru0105] But for the word “email”, use the English word.

• [ru0106] Hyphen should be used for words according to the Russian grammar rules such
as: “что-то”, “по-русски”.

4. Tags

Format
• [tags0101] All tags should be in XML format (e.g., <UNKNOWN/>).
• [tags0102] Use methods provided in the UI for inserting tags (function key, right-click)
rather than spelling out the tags since spelling them out is error-prone and it is
mandatory that the tags are correctly spelled and in the correct syntax.

22
Spacing (high-level summary; for details, see each tag)
• [tags0201] For tags that associate with a word, the tag should be added after the word,
and with no space between the tag and the word. There must be a space separating the
tag from the other neighboring words. Examples:
i. “hey cortana<MP/> tell me a joke” (“cortana” is miss pronounced but it is clear
that the speaker meant “cortana” (details see [1402] and [1403])
• [tags0202] Tags that denote events by themselves (e.g., <UNKNOWN/>) should be
surrounded by spaces as if they were regular words.
• [tags0203] Sentence-level tags (e.g., CNOISE, NPS) should have no space between them
if they appear together, but there should be a space between sentence level tags and
other transcription. E.g., “<CNOISE/><NPS/> hi mom”
• For space requirements related to <NIS></NIS>, see [tags0806]
Transcription Modes

• [tags0301] Tags to be used will depend on the transcription mode requested. Please
refer to the following table to see which tags apply to the different transcription modes.

Orthographic Orthographic
only + Noise
<PII/> X X
<UNKNOWN/> X X
<BA/> X X
<FILL/> X X
<MP/> X X
<SN/> X
<CNOISE/> X

23
The HITApp informs you whether to use (O) or (OT) tag sets (on the top right
corner of the HITApp UX). Examples below.
Orthographic only (O) Orthographic + Noise (OT)

Active Tags

Tag Definition
<PII/> Personal Identifiable Information.

Tag utterances with <PII/>


If you find the following information in an utterance, please tag it as <PII/> and do
not transcribe it.
Note: sometimes if judging by the audio you are transcribing, it may not be PII (such
as the audio only contain first name), but if adding the contents from the audios in
context (audios before or after the audio being transcribed) it become PII (such as
the neighboring audio contained last name while the previous audio contains first
name), then the audio should be marked as PII.

1. A long consecutive digits


If you found 5 or more consecutive digits, please tag it as <PII/>. “Consecutive”
means digits may be uttered with special characters (e.g. dot, dash, slash, etc.) or a
single word (e.g. “and”, “hundred”, “double”, etc.) inserted. Please count those as
consecutive digits. Examples below:
• Tag: “I will try three three four one two eight”
• Tag: “That’s seven eight two dot one one five dot ….”
• Tag: “That’s four hundred and nineteen dash eight eight double zeros”.

2. A full name with a single-person pronoun, etc.


If you found the speaker’s full name (such as “Adam Mattson”) with a single-person
possessive pronoun and determiner (or “I, my, me, mine, myself”) in the same
utterance (or right before the present audio), please tag it as <PII/>. If it’s only a first
or last name, you don’t need to tag it. Examples below:
• Tag: “My name is Adam Mattson”
• No-Tag: “Adam Mattson” (We don’t’ know if Adam Mattson is the name of
the speaker just by this).
• No-Tag: “I’m Adam”

24
3. Email address
If you found an email address (e.g. “myname123@gmail.com”), please tag it as
<PII/>.

4. Physical address
If you found a physical address with detailed street numbers and street address,
please tag it as <PII/>. If it’s only a town, city or state name, you don’t need to tag it.
Examples are below:
• Tag: “123 North East 98th court”
• No-Tag: “It’s Redmond, Washington”
• No-Tag: “My address number is 123”.

5. Contextual PII
If you listen to a whole utterance and think the “content” of the utterance may
identify the speaker, please tag it as <PII/>. One example of this case is a combination
of multiple non-PII may lead to a PII information. For example:
• “My last name is Dale, and I live in Redmond, Washington”
In this case, “Dale” is only a last name, and “Redmond” is not a full address. But if you
think, “there may be only one “Dale” in Redmond”, then please go ahead and tag it
as PII.

This “contextual PII” is very subjective, but if you think such information may identify
the speaker, then tag it.

<UNKNOWN/> Unknown

[tags0402] Use this tag when there is obvious human speech, but one or more of the
actual words cannot be determined. Difficulty in understanding may be a result of
strong accent, low volume, speech too fast, poor pronunciation, or any other reason.

[tags0404] Use this tag when either the first word or the last word in an utterance is
cutoff (truncated) by the audio.

[tags0405] Do not use this tag when the speech is clear but ambiguous due to various
reasons, such as homophones, variant spelling, etc., for which cases, consult
respective sections: [1701] for ambiguity, [0703] for spelling, or [1301] for in-box
assistance for name spelling etc.

[tags0406] For cases where the entire utterance is unknown (either unintelligible or
in a foreign language), don’t add additional tags. Just mark the entire utterance as
<UNKNOWN/>.

[tags0407] For cases where part of the audio is intelligible and part is not, use
<UNKNOWN/> for the unintelligible portion and continue to transcribe the intelligible
portions as usual, including tags.
<BA/> [tags1101] Tag an utterance with <BA/> if it has no contents in the audio-it is a blank
audio.
<SN/> Sudden Noise
[tags0601] Any sudden or short noise which is clearly audible (at comparable volume
as the main speaker or louder). i.e., if you can hear a sudden noise, then it should be
tagged. This includes human-generated noises (e.g., cough) as well as non-human
noises. Sudden noises can be considered in two cases:

25
Isolated noise
[tags0602] If a sudden isolated noise is clearly audible, then it should be tagged. If you
have to strain to hear it, then tagging is not required.
e.g., “call mom” followed by a door slam would be: “call mom <SN/>”
During speech
[tags0603] If the sudden noise occurs during a word, append the tag (with no spaces)
to the end of the word.
e.g., “call mom” with a door slam during “mom” would be: “call mom<SN/>”

<FILL/> Filler
[tags0701] Use if user is producing a filled pause that is not an actual word with
semantic meaning (such as umm, er, ah, etc.).
[tags0702] If the speaker is using a real word as a filler (such as “like”), transcribe that
word. Note, speech like “uh-huh”, “uh-uh” should be treated and transcribed as actual
words.
[tags0703] Use if there is a false start, a partial word (not truncated) or stumbled over
speech in the utterance.
e.g., “gu- going to the store” → “<FILL/> going to the store” (since
gu- is an incomplete word)

[tags0704] Filled pauses are like words and should be separated from neighboring
words by spaces.
[tags0705] Consecutive fillers should be transcribed with a single <FILL/> tag.
Example:
“yeah john <FILL/> I think we should <FILL/> definitely meet this weekend
and <FILL/> figure something out”
<CNOISE/> Continuous Noise (sentence level tag)

[tags0901] Use this tag for any utterance that contains continuous noise (e.g., crying,
music, singing, humming, whistling, laughter, traffic, or other white continuous noise)
throughout the recording. If there are multiple instances of distinct noises, use <SN/>
tag.

[tags0902] Place the <CNOISE/> tag at the beginning of the utterance, followed by a
space. Words from the primary speaker should be transcribed as normal.

[tags0903] If both sentence-level tags apply (i.e., <CNOISE/> and <NPS/>), then list
them in alphabetical order, with no space between them.

Examples:

“<CNOISE/> call mom”


“<CNOISE/><NPS/> call mom”
<MP/> [tags1001] This tag is used for transcription of ‘cortana’ only.

If pronunciation of ‘cortana’ is incorrect (e.g., ‘cortina’, ‘cortona’ or partially


pronounced “cortana”) and you are sure “cortana” is the intended word, transcribe it
as ‘cortana’ and add <MP/> tag right after the word (no space between “cortana”
and <MP/> tag, but there should be a space after <MP/> tag and the following word
if there is one.

26
Examples:

“hey cortana<MP/> tell me a joke”


‘hey cortan-‘ (where the final ‘a’ is not spoken by the speaker) → ‘hey
cortana<MP/>’

(for details of transcription of “cortana” see section “Cortana


Pronunciation”

5. Exceptions
This section outlines exceptions to the above guidelines, for specific HitApps.

Conversational Speech
• Do not use <NPS/> tag for this HitApp. All speech shall be transcribed
• Do not use <NIS></NIS> tags for this HitApp

6. Guideline Codes
General Format: [Category + GuidelineID + SubGuidelineID]

Category
• tags – for guidelines related to use of tags
• languagecode – 2-letter ISO 639-1 code depending on language (en, ja, fr…)
• none – for the core guidelines that apply to all languages

GuidelineID
• 2-digit code used to identify a single guideline or a group of guidelines

SubGuidelineID
• 2-digit code used to identify a sub-guideline

Examples
• [0401]
• [1303]
• [ja0202]
• [tags0201]

Maintenance
• As guidelines are added, new unique codes need to be created. Do not reused retired
code.

27
<OVERLAP> • Use this tag for overlapped speech when there are overlapping
<OVERLAP/> conversation (when they are talking at the same time)
• You only need to transcribe on speaker’s speech in the transcription,
choose any speaker’s speech you can understand the best.
|-----------------------------------------TIME---------------------------------------
------|
Speaker1: word1 word2 word3 word4 word5 word6 word7
Speaker2: word1 word2 word3 word4 word5
word6

Scenario1: If you can understand either speaker 1 or speaker 2 in the


overlapping words, the transcription can be either of the following:

1. Word1 word2 word3 word4 <OVERLAP> word5 word6 word7


<OVERLAP/> word4 word5 word6
2. Word1 word2 word3 word4 <OVERLAP> word1 word2 word3
<OVERLAP/> word4 word5 word6

Sceanrio2: If you can’t understand any word said by both speakers in the
overlapping words, the transcription is:

Word1 word2 word3 word4 <OVERLAP> <UNKNOWN/> <OVERLAP/>


word4 word5 word6

Scenario 3: If you can only understand word5 from speaker 1, the


transcription is

Word1 word2 word3 word4 <OVERLAP> word 5 <UNKNOWN/>


<OVERLAP/> word4 word5 word6

Scenario 4: If you can only understand word 2 from Speaker 2, the


transcription is

Word1 word2 word3 word4 <OVERLAP> <UNKNOWN/> word 2


<UNKNOWN/> <OVERLAP/> word4 word5 word6

For example: Speaker 1: hello my name is ying luo what is your name

Speaker 2: hello my name is anita.

Speaker1 said “your name” and speaker2 said “hello” at the same time, then
the transcription is

“hello my name is ying luo what is <OVERLAP> your name <OVERLAP/>


my name is anita”

Or “hello my name is ying luo what is <OVERLAP> hello <OVERLAP/> my


name is anita”
Appendix A Punctuation list

"en-US" and “en-CA”


o .\PERIOD
o \n\NEW_LINE
o \n\NEW_PARAGRAPH
o ,\COMMA
o ?\QUESTION_MARK
o !\EXCLAMATION_MARK
o !\EXCLAMATION_POINT
o :\COLON
o ;\SEMI_COLON
o "\QUOTE
o "\UNQUOTE
o "\QUOTATION_MARK
o “\OPEN_QUOTE
o ”\CLOSE_QUOTE
o “\OPEN_QUOTATION_MARK
o ”\CLOSE_QUOTATION_MARK

"en-GB", "en-IN", and "en-AU"


o .\PERIOD
o .\FULL_STOP
o \n\NEW_LINE
o \n\NEW_PARAGRAPH
o ,\COMMA
o ?\QUESTION_MARK
o !\EXCLAMATION_MARK
o !\EXCLAMATION_POINT
o :\COLON
o ;\SEMI_COLON
o "\QUOTE
o "\UNQUOTE
o "\QUOTATION_MARK
o “\OPEN_QUOTE
o ”\CLOSE_QUOTE
o “\OPEN_QUOTATION_MARK
o ”\CLOSE_QUOTATION_MARK
o

28
"fr-FR" and "fr-CA"
o .\POINT
o \n\NEW_LINE
o \n\NOUVELLE_LIGNE
o \n\SAUT_DE_LIGNE
o ,\VIRGULE
o ?\POINT_D'INTERROGATION
o !\POINT_D'EXCLAMATION
o :\DEUX_POINTS
o ;\POINT_VIRGULE
o «\GUILLEMET_OUVRANT
o »\GUILLEMET_FERMANT
o «\OUVRIR_LES_GUILLEMETS
o »\FERMER_LES_GUILLEMETS

"it-IT"
o .\PUNTO
o \n\NEW_LINE
o \n\A_CAPO
o \n\NUOVA_RIGA
o \n\NUOVA_LINEA
o ,\VIRGOLA
o ?\PUNTO_INTERROGATIVO
o ?\PUNTO_DI_DOMANDA
o !\PUNTO_ESCLAMATIVO
o :\DUE_PUNTI
o ;\PUNTO_E_VIRGOLA
o “\VIRGOLETTE_APERTE
o ”\VIRGOLETTE_CHIUSE
O “\APRI_VIRGOLETTE
O “\APRI_LE_VIRGOLETTE
O “\CHIUDI_VIRGOLETTE
O “\CHIUDI_LE_VIRGOLETTE

"de-DE"
o .\PUNKT
o .\SATZENDE
o \n\NEUE_ZEILE
o \n\ZEILENUMBRUCH
o \n\NEW_LINE
O \N\ABSATZ
O ,\KOMMA
O ,\BEISTRICH
o ?\FRAGEZEICHEN
o !\AUSRUFEZEICHEN
o !\RUFZEICHEN
o !\AUSRUFZEICHEN
o :\DOPPELPUNKT

29
o :\KOLON
o ;\STRICHPUNKT
o ;\SEMIKOLON
O "\ANFÜHRUNGSZEICHEN
o „\ÖFFNENDES_ANFÜHRUNGSZEICHEN
O “\SCHLIESSENDES_ANFÜHRUNGSZEICHEN
O „\ÖFFNENDES_GÄNSEFÜSSCHEN
O “\SCHLIESSENDES_GÄNSEFÜSSCHEN
O „\ANFÜHRUNGSZEICHEN_UNTEN
O “\ANFÜHRUNGSZEICHEN_OBEN
O „\GÄNSEFÜSSCHEN_UNTEN
O “\GÄNSEFÜSSCHEN_OBEN

"es-ES"
o .\PUNTO
O .\PUNTO_FINAL
o \n\SALTO_DE_LÍNEA
o \n\NUEVA_LÍNEA
o \n\NEW_LINE
o ,\COMA
o ?\SIGNO_DE_INTERROGACIÓN
o ?\SIGNOS_DE_INTERROGACIÓN
o !\SIGNO_DE_EXCLAMACIÓN
o !\SIGNOS_DE_EXCLAMACIÓN
O !\CERRAR_SIGNO_DE_EXCLAMACIÓN
O ¡\ABRIR_SIGNO_DE_EXCLAMACIÓN
O :\DOS_PUNTOS
O ;\PUNTO_Y_COMA
O «\COMILLAS_IZQUIERDAS
O »\COMILLAS_DERECHAS
O ¿\ABRIR_SIGNO_DE_INTERROGACIÓN
O ?\CERRAR_SIGNO_DE_INTERROGACIÓN

"es-MX"

O .\PUNTO
O .\PUNTO_FINAL
O \n\SALTO_DE_LÍNEA
O \n\NUEVA_LÍNEA
o \n\NEW_LINE
O ,\COMA
O ?\SIGNO_DE_INTERROGACIÓN
O ?\SIGNOS_DE_INTERROGACIÓN
O !\SIGNO_DE_EXCLAMACIÓN
O !\SIGNOS_DE_EXCLAMACIÓN
O !\CERRAR_SIGNO_DE_EXCLAMACIÓN
O ¡\ABRIR_SIGNO_DE_EXCLAMACIÓN
O :\DOS_PUNTOS
O ;\PUNTO_Y_COMA
O “\COMILLAS_IZQUIERDAS
O ”\COMILLAS_DERECHAS
O ¿\ABRIR_SIGNO_DE_INTERROGACIÓN

30
O ?\CERRAR_SIGNO_DE_INTERROGACIÓN

"PT-PT"
o .\PONTO_FINAL
o \n\NOVA_LINHA
o \n\MUDAR_DE_LINHA
o \n\NOVO_PARÁGRAFO
o ,\VÍRGULA
o ?\PONTO_DE_INTERROGAÇÃO
o !\PONTO_DE_EXCLAMAÇÃO
o :\DOIS_PONTOS
o ;\PONTO_E_VÍRGULA
o “\ABRIR_ASPAS
o ”\FECHAR_ASPAS

"pt-BR"
o .\PONTO_FINAL
o ,\VÍRGULA
o \n\NOVA_LINHA
o \n\MUDAR_DE_LINHA
o \n\NOVO_PARÁGRAFO
o \\BARRA_INVERTIDA
o \\CONTRABARRA
o /\BARRA
o :\DOIS_PONTOS
o ;\PONTO_E_VÍRGULA
o !\EXCLAMAÇÃO
o !\PONTO_DE_EXCLAMAÇÃO
o ?\INTERROGAÇÃO
o ?\PONTO_DE_INTERROGAÇÃO
o @\ARROBA

"ru-RU"
o .\ТОЧКА
o \n\НОВАЯ_СТРОКА
o «\КАВЫЧКА
o »\КАВЫЧКА
o "\КАВЫЧКИ
o ?\ВОПРОСИТЕЛЬНЫЙ_ЗНАК
o !\BОСКЛИЦАТЕЛЬНЫЙ_ЗНАК
o :\ДВОЕТОЧИЕ
o ;\ТОЧКА_С_ЗАПЯТОЙ
o -\ТИРЕ
o -\ДЕФИС
o ,\ЗАПЯТАЯ
o \\КОСАЯ_ЧЕРТА
o \\ЗНАК_ДРОБИ
o /\ОБРАТНАЯ_КОСАЯ_ЧЕРТА

"ja-JP"
O 。\句点
o \n\改行

31
O 、\読点
O 、\コンマ
o ?\疑問符
O ?\クエスチョンマーク
O !\感嘆符
O !\エクスクラメーションマーク
O :\コロン
O ;\セミコロン

"zh-CN"
o 。\句号
o \n\换行
o \n\新建一行
o \n\换一行
o ,\逗号
o ?\问号
o !\感叹号
o :\冒号
o ;\分号
o “\上引号
o ”\下引号

32

You might also like