Professional Documents
Culture Documents
Update:8/30/2021
3.3.t If the voice is from TV, Siri, Google translation etc., if you can hear the video clearly, transcribe it. If it is
unclear, please discard it.
Table of Content
2. Workflow .........................................................................................................................................................................
4
3. Annotation Guidelines .....................................................................................................................................................
4
3.1 Discard: ...................................................................................................................................................................... 4
4 a) Do not think about the completeness of the sentence, while cutting the
audio ................................................................ 4
d) Part of gray area is music, melodies, songs, animal, or natural sounds: ......................................................................... 5
a) Spaces are needed between words. Never wrap textb) The final intercepted audio must contain at least two words
(≥2). Must contain at least 1 meaningful word. .................................................................................................................. 6
c) Arabic numbers should be transcribed into the word in English. E.g. 1--> one ..............................................................
e) No punctuation in text except hyphen (-) or apostrophe (‘) that is appropriately used for word spelling....................... 6
g) Abbreviation ...................................................................................................................................................................
6
h) Capitalization ..................................................................................................................................................................
6
i) Informal words (Trending words that are not found in dictionary) .................................................................................
6
l) homophone: .....................................................................................................................................................................
7
t) If the voice is from TV, Siri, Google translation etc., just discard it, we only transcribe clear human speech ................
7
explanation
• gray part: a piece of intercepted audio by default, we can ONLY revise the gray part.
• blue part: your segmented result, also you shall transcribe it into text.
• white part: the audio before and after gray part, no need to transcribe or/and segment; but you could also listen
to this part just for your reference.
• Audio classes:
Keyboard shortcut:
1 - continue to play where you left off.
2 - pause.
• Step 3-1. If you choose ‘discard’ classification, submitting this task directly. No need to change the text
below.
• Step 3-2. If you choose ‘speech’ classification, you need to determine whether to intercept the audio or
not. And then transcribe the audio.
3. Annotation Guidelines
3.1 Discard:
• the entire audio is in not English.
• the entire audio is song with lyrics in the background or non-human speech, which includes melodies,
animals' sounds, and natural sounds.
• Only one English word should be discarded. (Compound word is considered as one word
like ‘fifty-five’)
• the entire audio is modal words.
Note: If you select “discard”, no need to transcribe, just click “submit” and go to next audio.
3.2 Segment:
a) Do not think about the completeness of the sentence, while cutting the audio.
b) Part of gray area is unclear.
• A segment should always start with clear words, if it’s unclear in the front part or the back part,
you have to intercept it.
But note: If the noise (unclear part) affects the content, intercept it, keep the rest and transcribe. If the
noise does not affect the content, ignore it and transcribe the entire audio.
***BUT, if the background sound does not affect the clarity of the speaker's speech, transcribe
the speaker's speech, and ignore the background sound.
• Keep and Transcribe:
If the background sound is melodies without lyrics and human speech is clear keep it
and transcribe the entire audio. if the speaker is singing a song without background
melodies – transcribe it.
e) Pause/noise at the beginning, middle and end of the audio clip.
• If the noise affects the content, intercept it, keep English, and transcribe. If the noise does not
affect the content, ignore it, and transcribe the entire audio.
※For example: “speech 1 + pause/noise (does not affect the content) + speech 2”. Transcribe
speech1 + speech 2.
※For example: speech1 + pause/noise (affect the content) + speech2 --- either
“speech1” or “speech2” is accepted for a segment. Do not transcribe both.
f) Modal words
• In the beginning or end of the intercepted audio
The selected speech should start with (and end with) up to 2 modal words. ※Example:
There is a paragraph laughing (around 10 "ha") at the beginning of speech, it's enough to keep
a fraction of this part in audio (around 2 "ha ha" in audio ).
Uncountable modal words ---- cut it out, only transcribe the English part.
• In the middle of intercepted audio:
If you can clearly count the number of the modal words, you should transcribe.
※For repeat modal words, write down the same number of modal words in the audio. eg.
3 "ha" in the audio, you need to write "ha ha ha" in the text.
Uncountable modal words ---- do not transcribe and intercept it.
※For example: speech1 + uncountable modal words + speech2
Either “speech1 ” or “speech2” is accepted for a segment. Do not transcribe both.
For modal words that are not in the list, as long as it's not a breathing sound, we can still transcribe it if the number of
it can be counted.
• For commonly used informal words, we advise you to write informal words, transcribe as
you hear. but standard forms are also correct and acceptable. But if you are not sure whether
the informal forms are correct or acceptable, you can write standard words.
j) Words with non-standard pronunciation:
• If pronunciation is not standard but able to tell the correct word, transcribe the correct word.
• There is a half-pronounced word which is an individual word, we can keep and transcribe it.
Eg: "I want to go to the super" , should transcribe "I want to go to the super" the word
‘supermarket’ is half spoke, but we can hear “super”. “super” is individual word.
The right transcription of this case is "I want to go to the super"
l) homophone:
• Listen to the following default cut to confirm what the whole sentence is, write down the
correct word by context.
※Eg1. The current cut is "The hole (or other word that sounds like /həʊl/ but you
cannot confirm) ", but you can know the sentence is "The whole town disagreed with the
mayor." from the following default cut. So the right transcription of this case is "the
whole".
• If there are multiple homophones whose meaning conforms to the meaning of the default cut
sentence, you can write any word.
※Eg2. The default cut is "where is my deer/dear." Both words match the meaning of the sentence,
you can write anyone.
m) Simplified form/ spoken language
• Transcribing the corresponding form that speaker says. Transcribe what you hear, do not correct
grammar mistakes.
※Eg1. “I'm gonna do some sports”. According to the audio, must be written as "gonna", cannot
be written as "going to".
n) Bad language, abusing words, accelerated audio:
• Both the abusing words and bad language need to be transcribed.
• if the accelerated audio can be heard clearly, just transcribe it.
o) Poem should be transcribed normally.
p) Double spaces between words are ok.
q) Spelled word
For the word spelled between each letter it must be given a space. And please use lower case.
Eg: “a w e s o m e”
r) Words with non-standard pronunciation
If pronunciation is not standard but able to tell the correct word, transcribe the correct word, If unable
to tell, treat it as unclear word an intercept.
s) If the accelerated audio can be heard clearly, just transcribe it.
t) If the voice is from TV, Siri, Google translation etc., if you can hear the video clearly,transceibe it.If it is
unclear,please discard it.