Professional Documents
Culture Documents
Update 11/4
Update 09/28
3.3 Text transcribes.
t) If the voice (must be speech rather than music) is from TV, Siri and Google translation etc., if
you can hear clearly, transcribe it. If it is unclear, please discard it.
Catalogue
1. Introduction of Platform Manual .......................................................................................................................................3
explanation ....................................................................................................................................................................3
2. Workflow ...........................................................................................................................................................................4
a) Do not think about the completeness of the sentence while cutting the audio. .........................................5
e) Pause/noise at the beginning, middle and end of the audio clip. ...............................................................6
b) The final intercepted audio must contain at least two words (≥2).....................................................................6
d) Special characters..............................................................................................................................................6
e) Punctuation........................................................................................................................................................6
g) Abbreviation .....................................................................................................................................................6
h) Capitalization ....................................................................................................................................................7
l) homophone.........................................................................................................................................................7
o) Poem .................................................................................................................................................................8
explanation
• gray part: a piece of intercepted audio by default, we can ONLY revise the gray part.
• blue part: your segmented result, also you shall transcribe it into text.
• white part: the audio before and after gray part, no need to transcribe or/and segment; but you could also listen
• Audio classes:
Keyboard shortcut:
1 - continue to play where you left off.
2 - pause.
s - start cut.
e - end cut.
2. Workflow
• Step 1. Listen to the default audio. (gray part)
• Step 3-1. If you choose ‘discard’ classification, submitting this task directly. No need to change the text below.
• Step 3-2. If you choose ‘speech’ classification, you need to determine whether to intercept the audio or not.
3. Annotation Guidelines
3.1 Discard:
• the entire audio is in not English.
• the entire audio is songs or non-human speech, which includes melodies, animals' sounds, and natural
sounds.
• Only one English word should be discarded. (Compound word is considered as one word
like ‘fifty-five’)
• the entire audio contains only modal words.
Note: If you select “discard”, no need to transcribe, just click “submit” and go to next audio.
3.2 Segment:
a) Do not think about the completeness of the sentence, while cutting the audio.
• A segment should always start with clear words, if it’s unclear in the front part or the back part, you
have to intercept it.
• If it’s unclear in the middle of a speech, please cut either side.
For example: “Clear speech1 + unclear+ Clear speech2” -- either “Clear speech1 ” or “Clear speech2”
is accepted for a segment. Do not transcribe both.
But note: If the noise (unclear part) affects the content, intercept it, keep the rest and transcribe. If the noise
does not affect the content, ignore it and transcribe the entire audio.
• Discard:
If the entire audio is a song or non-human voice, like music, melodies, the sound of animal and
nature and so on – discard this audio.
• Cut Out:
If the background sound is a song with lyrics, cut this part out and reserve the clear human speech
part or discard the entire audio if it's hard to cut the audio.
***BUT, if the background sound does not affect the clarity of the speaker's speech, transcribe the
speaker's speech, and ignore the background sound.
• Keep and Transcribe:
If the background sound is melodies without lyrics and human speech is clear keep it and
transcribe the entire audio.
if the speaker is singing a song without background melodies – transcribe it.
If the main voice - singing is clear, but the bgm without lyrics – Transcribe
If the main voice - singing is clear, there's bgm with lyrics - discard or cut out overlapped part
***BUT, if the audio has only played songs from player, discard it.
• If the noise affects the content, intercept it, keep English, and transcribe. If the noise does not affect
the content, ignore it, and transcribe the entire audio.
※For example: “speech 1 + pause/noise (does not affect the content) + speech 2”.
Transcribe speech1 + speech 2.
※For example: speech1 + pause/noise (affect the content) + speech2
--- either “speech1” or “speech2” is accepted for a segment. Do not transcribe both.
f) Modal words
For modal words that in words collection, we can transcribe it if the number of it can be counted.
b) The final intercepted audio must contain at least two words (≥2).
c) Arabic numbers should be transcribed into the word in English. E.g. 1 -> one.
f) Repeated words and sentences must be transcribed strictly according to the number of times they get repeated.
g) Abbreviation
• Abbreviations for special terms/name etc. (eg. ANTV, SCTV, I-LAND) need to be CAPITALIZED.
• Abbreviations for phrase like gws, otw, bwt,imo, lol, etc (get well soon, on the way, by the way, in my
opinion, laugh out loud, etc), these abbreviations should be written in in lowercase."
h) Capitalization
• Do not capitalize the first letter of text except for proper nouns
• Proper nouns should be capitalized accordingly. Location (city name, street name), person’s name, brand,
zodiac etc.
e.g. New York, Istanbul, Turkey, KFC, NBA
• FB and IG are abbreviations of some proper nouns, just type them as they are.
• if the voice is clear but not standard (example: baby voice, stammer voice), you should
transcribe it in standard words.
• Non-existing words are not acceptable. Only standard words and commonly used informal words
can be accepted in transcription.
• For commonly used informal words, we advise you to write informal words, transcribe as you
hear. but standard forms are also correct and acceptable. But if you are not sure whether the
informal forms are correct or acceptable, you can write standard words.
j) Words with non-standard pronunciation:
• If pronunciation is not standard but able to tell the correct word, transcribe the correct word.
• There is a half-pronounced word which is not an individual word, we should cut it off.
Eg: “I wanted you to be the Ame”, should cut “Ame ” and transcribe “I wanted you to be the”
• There is a half-pronounced word which is an individual word, we can keep and transcribe it.
Eg: "I want to go to the super" , should transcribe "I want to go to the super" the word
‘supermarket’ is half spoke, but we can hear “super”. “super” is individual word. The right
transcription of this case is "I want to go to the super"
l) homophone:
• Listen to the following default cut to confirm what the whole sentence is, write down the correct word by
context.
※Eg1. The current cut is "The hole (or other word that sounds like /həʊl/ but you cannot
confirm) ", but you can know the sentence is "The whole town disagreed with the
mayor." from the following default cut. So the right transcription of this case is "the
whole".
• If there are multiple homophones whose meaning conforms to the meaning of the default cut sentence,
you can write any word.
※Eg2. The default cut is "where is my deer/dear." Both words match the meaning of the sentence, you
can write anyone.
m) Simplified form/ spoken language
• Transcribing the corresponding form that speaker says. Transcribe what you hear, do not correct grammar
mistakes.
※Eg1. “I'm gonna do some sports”. According to the audio, must be written as "gonna", cannot be
written as "going to".
n) Bad language, abusing words, accelerated audio:
q) Spelled word
For the word spelled between each letter it must be given a space. And please use lower case.
Eg: “a w e s o m e”
r) Words with non-standard pronunciation
If pronunciation is not standard but able to tell the correct word, transcribe the correct word, If unable to
tell, treat it as unclear word an intercept.
s) If the accelerated audio can be heard clearly, just transcribe it.
t) If the voice (must be speech rather than music) is from TV, Siri and Google translation etc., if you can hear clearly,
transcribe it.
If it is unclear, please discard it.