Professional Documents
Culture Documents
English Transcription Guidelines
English Transcription Guidelines
1. Principle of Transcription
Based on audio, just write as what you hear. Original text for reference only.
Annotation work is to make the text content consistent with the audio by modifying
adding and deleting text content!
2. Overview of Transcription
The original text and segments already exist, but they are not accurate. What we
need to do is to correct the text and segments. The text content in each paragraph
must be consistent with what the audio says.
No wrong words, missing words or more than one word.
3. Points of Transcription
(1) Vad line: The white line in the interface is vad line which can be moved, added
and deleted. It’s used to cut out different speakers, quiet noise, other labels, etc.
Ⅰ. Vad lines can not be added to words, that is, a word can not be divided into two
segments to make the syncopation incomplete. The pronunciation of the first and
last words in each segment must be complete.
Ⅱ. Each segment cannot exceed 200 characters!
Ⅲ. Put different speakers in different segments!
Ⅳ. After the vad line is adjusted, the text content should be adjusted accordingly.
(2) Attribute:
Ⅰ. Do not select any attributes in normal speaking segments.
Ⅱ. Choose <NOISE> uniformly in abnormal speaking segments.
Including: Mute, noise, music, singing, cough, laughter, non-English, words that can't
understand, words that can't hear clearly, can't hear each side of mixed reading (if
there are words of one side that can be heard clearly, just transcribe the words of
clear side), voice-over in the later period, audio and video playback, machine
customer service voice connected by telephone (Real phone calls need to be marked
if they are clear), advertisements, recorded credits, etc.
Ⅲ. Be careful not to omission or misselection, that is, the attribute column can only
be blank or <noise>. If there are other tags, you need to modify according to the
voice fragment.
A. Text Content
1. There cannot be needless blank space at the beginning and end of sentence of
each segment, as following picture shows:
But if there is punctuation between two sentences of one segment, we should put a
blank space after the punctuation, as following picture shows:
2. Change the case and punctuation according to the meaning of the sentence (the
first letter of the sentence, proper nouns, human’s names, places’ names, etc.)
instead of the segment. The content layers are linked together to form an English
article (due to the <noise> , there may be a case that the sentence semantics is not
smooth);
3. There can only be English words in content layer. Arabic numerals (such as mobile
phone number, ID number, month, year, etc.), Roman numerals (Ⅰ, Ⅱ, etc.), special
symbol (@, $, %, &, etc.) should be transcribed as the corresponding English word
one/at/dollar/percent and etc according to the pronunciation.
The year 2017, for example, might be read as ‘two Thousand and seventeen’,
‘Twenty-three and seventeen’ or ‘two zero one seven’. Just transcribe the
corresponding words according to the actual pronunciation.
4. Letters must be uppercase and connected with blank spaces, such as O K, M S N,
Q Q, etc. If the abbreviation is pronounced as a word instead of a letter, such as
NASA pronounced as ['næsə], SARS pronounced as [sa:z], then marked as NASA,
SARS. Letters can’t be capitalized and can’t be connected with blank spaces.
5. If the pronunciation of words is complete, you should mark as many times as you
hear. If it’s incomplete, the incomplete part of words should be cut separately and
marked as <NOISE>.
6. If the drag sound is very long, while maintaining the integrity of the
pronunciation, the drag sound and the blank part should be marked as <noise>. If
only a little drag sound, you should cut it out where the sound stops.
7. Modal words read completely should be transcribed as many times as you hear.
The unified standard is as follows (if there are any other modal words beyond the
following words in the audio, please refer to the dictionary annotations, do not make
up your own words!)
[ʌm; əm]--um , [ɑ:]--ah , [ə'hɑ:; ɑ:'hɑ:]--aha , [əm'hm]--mhm , [waʊ;wou]--
wow,[ʌh; ʌ;ən]--uh,[o;əʊ]--oh
8. Only transcribe English. If there are Chinese common names of human or place,
just transcribe normally, such as Beijing, Shanghai, Li Lei, Han Meimei, etc. Other
Chinese or any other language should be marked as <NOISE>.
9. If the whole audio cannot be marked normally and there are only mute, noise,
non-English, dubbing, recorded sound, etc., Then the whole data will be invalid, and
you can directly click the button "Annotate as invalid data"
If the data does not have original vad lines or text, you should divide the audio into 3
segments by adding vad lines, mark each segment as <noise>, save, annotate as
invalid data and submit.
B. Punctuation
Principle: Modify punctuation according to the text content instead of segment or pause.
C. Fallible Points
1. If the quiet noise is less than 0.5s, it can be put in the same segment as the
content of the speech, but it is not wrong to cut it separately. If the quiet noise is
more than 0.5s, it must be a new segment of its own.
2. If you can’t understand the words spoken by person in the audio, though the
sound quality is clear, please refer to the dictionary firstly, and mark <NOISE> if you
really cannot understand it.
3. When the main speaker speaks, another person interrupts. If the time points
overlap, there is no need to transcribe the interrupting words. If the time points do
not overlap, words spoken by the other speaker should be separately cut and
transcribed normally.
a. If the interpose is short, just like yeah, okay, and the speaker makes a coherent
logical statement before and after the interpose, then the speaker does not use a
period in the first half of the sentence.
For example, ‘My name is okay Mike’. Here, There cannot be any punctuation behind
‘My name is’. Capitalize the word ‘Okay’ and use closing punctuation.
RIGHT: My name is vad line Okay. Vad line Mike.
b. If the interpose clearly interrupts the speaker's logic, use concluding punctuation,
such as a period, question mark or exclamation mark, at the first half of the speaker's
sentence.
For example, ‘My name is. Vad line. What are you talking about?’
‘My name is’ is punctuated, because it's switching the speaker.
Ⅵ.
Is that you, Mary?