ASR Transcription Rule-Long Audio
1. Brief introduction
This is a long audio transcription audit project. All audios will be about 3 minutes duration. What we
need to do is,
1. Listen to the audio, judge whether it is valid.
2. Correct the topic if it’s not correct.
3. Adjust the time stamp, segement every audio into several short clips, no longer than 10 seconds.
4. Judge whether it is valid for each clip and add tags accordingly.
5. Click the pre-transcribe button for valid clips, check the content of pre-transcription text, modify the
text if it has any error. The transcription text must strictly follow the audio content.
Listen to the audio, judge
whether it is valid audio.
Invalid: If the whole audio is
Valid: Choose an appropriate
invalid, please choose the
topic for it. Correct the topic if
invalid reason and submit it.
it’s not correct
Adjust the time stamp, divide every audio
into several short clips. Each clip should
be shorter than 10s.
Judge whether it is valid for each clip
If clip is invalid, the invalid duration
longer than 1s needs to be segmented
If clip is valid, click the pre-transcribe button for the valid
alone and labeled tags accordingly.
clips, check the content of pre-transcription text, modify
the text if it has any error. The transcription text must The invalid clips don’t need to
strictly follow the audio content. transcribe.
Appen Confidential 澳鹏内部文档,严禁外泄
2. Invalid audio reasons
If the whole audio has following problems, please choose corresponding invalid reason and submit
it.
a. Non-target language. The whole audio is not target language.
b. Illegal content. The audio involve religion, anti-political, pornography, violence, racial discrimination
etc.
c. Unidentifiable& Low quality: speaking too fast/pronunciation fuzzy/background noise too loud…
3. Segmentation Rules
1. Use the mouse to move the spectrum on the time stamp, press S and the audio will divide at
the spectrum’s position. Every long audio should be divided into several short clips. Every single
short clip should be less than 10s and less than 120 characters.
2. One single valid clip can only contain the content of one person’s speech, and different
person’s speech needs to be divided into another clip.
3. Each short clip should keep the relative completeness of the sentence meaning. Please try NOT to
break the completeness of a sentence due to the segmentation. If you must cut off a sentence
because of the time limit, please use a <continue> tag for valid clip.
4. The invalid duration > 1s needs to be segmented and labeled accordingly, if the invalid
duration < 1s, then we can ignore it and transcribe normally.
5. A certain mute part (less than 0.2s) must be reserved before and after each valid fragment.
4. Tag Rules
1.<oov>: Non-target language. A whole sentence or a part of the audio is full of other language.
2.<noise>: Non-human voice noise, including slight noise, applause, laughter, background music
only, crying, pure music, other silent noise etc.
3.<ad>: Advertisement. Obvious advertisements, like appearance of trade names or company names
etc.
4.<deaf>: Unidentifiable situations, including speaking too fast/pronunciation fuzzy/background
noise too loud etc.
5.<overlap>: Sound of many voices overlap, no matter it can be recognized or not.
6. <continue>: If the sentence meaning is incomplete due to segmentation, transcribe the clip and
add this tag to the beginning of next sentence, so to connect the following clips until the sentence
meaning is complete. The invalid duration clip can be skipped.
7. <BGM>: Audio speech+ songs with lyrics appear at the same time, transcribe the audio speech
and add this tag
Appen Confidential 澳鹏内部文档,严禁外泄
NOTE: Invalid clips need to assign tags and DO NOT need to transcribe. If a valid clip contains an
invalid duration shorter than 1s, transcribe it normally and NO need to segement it.
5. Transcription Rules
5.1 Strictly follow the principle of RECORDING EXACTLY WHAT YOU HEAR. DO NOT ADD, OMIT
ANY CONTEXT, Incorrect spelling is not allowed.
Examples 1: repetition words
Audio: where where are we going?
Transcription: where where are we going?
Examples 2: stutters
If the speaker is stuttering words like “w-what do you say?”, still need to transcribe as per what
you heard. “w-what do you say?”
5.2 Transcribe English words as its pronunciation. Sentences with 1-3 English words can transcribe
normally if you recognize them. But if a whole sentence or a part of the audio is full of English, then
this part needs to be segmented and marked as invalid clip, doesn’t need to transcribe. Transcription
should strictly follow the audio. If it pronounces English, then we should transcribe to English words.
A space should be added between two English words. For example: thank you
5.3 Proper nouns
a) English person name. The name of a well-known person must be transcribed by the name which
is officially recognized. General names should be marked with the most common characters. ->
Barack Obama, Donald Trump.
b) English Brand name. Brand name need to follow official published, like iPhone, Samsung.
c) The use of homonyms: Make sure the grammar is correct when the pronunciation is the same.
For example, He took some lights on a peace of paper -> He took some lights on a piece of paper.
(Peace obviously does not conform to semantic and grammar.)
d) Abbreviations Words. For abbreviation words, they should be capitalized. No matter it is
pronounced as one word or letter by letter, no space are allowed between letters. For example,
CAP and VIP.
5.4 Numbers
Numbers should be completely translated into the corresponding target language words according to
their pronunciation. Arabic number is NOT allowed to appear in the transcription. It must be written
as word.
Examples 1:
“5256” - > "five thousand two hundred and fifty-six "
" 19% "- >"nineteen percent".
Appen Confidential 澳鹏内部文档,严禁外泄
5.5 Punctuation
1. Only standard punctuation [,] [.] [?] [!] can be used during transcription. For example, “:-)”, ”>:-(“,
”:-|” are not allowed. But if there is different punctuation normally used in target language, for
example “।” is equal to “.” in Hindi language, that is acceptable.
2. Some special symbols #、@、*、&、% must be transcribed as word per pronunciation, for example:
“% “transcribe to “percent”
3. Period or question mark MUST be added at the end of a sentence.
4. Punctuation cannot be used continuously. For example, ",..." is not allowed.
5. When a sentence is cut into 2 clips where a <continue> tag assign, properly use punctuation
follow by the usage of grammar in the end of the first short clip.
6. Acceptance criteria
Average text accuracy higher than 96%.
Text accuracy= Correct number of words/Total number of standard words
Average text accuracy=Sum of text accuracy/Total number of spot check texts*100%
Appen Confidential 澳鹏内部文档,严禁外泄