You are on page 1of 6

English Transcription Guidelines

1. Principle of Transcription
Based on audio, just write as what you hear. Original text for reference only.
Annotation work is to make the text content consistent with the audio by modifying
adding and deleting text content!

2. Overview of Transcription
The original text and segments already exist, but they are not accurate. What we
need to do is to correct the text and segments. The text content in each paragraph
must be consistent with what the audio says.
No wrong words, missing words or more than one word.

3. Points of Transcription

(1) Vad line: The white line in the interface is vad line which can be moved, added
and deleted. It’s used to cut out different speakers, quiet noise, other labels, etc.

Ⅰ. Vad lines can not be added to words, that is, a word can not be divided into two
segments to make the syncopation incomplete. The pronunciation of the first and
last words in each segment must be complete.
Ⅱ. Each segment cannot exceed 200 characters!
Ⅲ. Put different speakers in different segments!
Ⅳ. After the vad line is adjusted, the text content should be adjusted accordingly.

(2) Attribute:
Ⅰ. Do not select any attributes in normal speaking segments.
Ⅱ. Choose <NOISE> uniformly in abnormal speaking segments.
Including: Mute, noise, music, singing, cough, laughter, non-English, words that can't
understand, words that can't hear clearly, can't hear each side of mixed reading (if
there are words of one side that can be heard clearly, just transcribe the words of
clear side), voice-over in the later period, audio and video playback, machine
customer service voice connected by telephone (Real phone calls need to be marked
if they are clear), advertisements, recorded credits, etc.
Ⅲ. Be careful not to omission or misselection, that is, the attribute column can only
be blank or <noise>. If there are other tags, you need to modify according to the
voice fragment.

(3) Content Layer (most important):


Purpose: to make the text content strictly consistent with the audio by modifying the
content layer. Do not rely on the original text overly!

A. Text Content
1. There cannot be needless blank space at the beginning and end of sentence of
each segment, as following picture shows:

But if there is punctuation between two sentences of one segment, we should put a
blank space after the punctuation, as following picture shows:
2. Change the case and punctuation according to the meaning of the sentence (the
first letter of the sentence, proper nouns, human’s names, places’ names, etc.)
instead of the segment. The content layers are linked together to form an English
article (due to the <noise> , there may be a case that the sentence semantics is not
smooth);
3. There can only be English words in content layer. Arabic numerals (such as mobile
phone number, ID number, month, year, etc.), Roman numerals (Ⅰ, Ⅱ, etc.), special
symbol (@, $, %, &, etc.) should be transcribed as the corresponding English word
one/at/dollar/percent and etc according to the pronunciation.
The year 2017, for example, might be read as ‘two Thousand and seventeen’,
‘Twenty-three and seventeen’ or ‘two zero one seven’. Just transcribe the
corresponding words according to the actual pronunciation.
4. Letters must be uppercase and connected with blank spaces, such as O K, M S N,
Q Q, etc. If the abbreviation is pronounced as a word instead of a letter, such as
NASA pronounced as ['næsə], SARS pronounced as [sa:z], then marked as NASA,
SARS. Letters can’t be capitalized and can’t be connected with blank spaces.
5. If the pronunciation of words is complete, you should mark as many times as you
hear. If it’s incomplete, the incomplete part of words should be cut separately and
marked as <NOISE>.
6. If the drag sound is very long, while maintaining the integrity of the
pronunciation, the drag sound and the blank part should be marked as <noise>. If
only a little drag sound, you should cut it out where the sound stops.
7. Modal words read completely should be transcribed as many times as you hear.
The unified standard is as follows (if there are any other modal words beyond the
following words in the audio, please refer to the dictionary annotations, do not make
up your own words!)
[ʌm; əm]--um , [ɑ:]--ah , [ə'hɑ:; ɑ:'hɑ:]--aha , [əm'hm]--mhm , [waʊ;wou]--
wow,[ʌh; ʌ;ən]--uh,[o;əʊ]--oh
8. Only transcribe English. If there are Chinese common names of human or place,
just transcribe normally, such as Beijing, Shanghai, Li Lei, Han Meimei, etc. Other
Chinese or any other language should be marked as <NOISE>.
9. If the whole audio cannot be marked normally and there are only mute, noise,
non-English, dubbing, recorded sound, etc., Then the whole data will be invalid, and
you can directly click the button "Annotate as invalid data"

If the data does not have original vad lines or text, you should divide the audio into 3
segments by adding vad lines, mark each segment as <noise>, save, annotate as
invalid data and submit.

B. Punctuation
Principle: Modify punctuation according to the text content instead of segment or pause.

1. Supports only 6 punctuation marks of commas, periods, question marks,


exclamation marks, hyphens and apostrophe (, . ? ! - ‘) in the English half-corner
input state. Don't use any punctuation other than these six.
2. Use ending punctuation (a period, an exclamation point or a question mark) at
the end of a sentence when switching between speakers. But if one person leaves
the sentence unfinished and the other person says ‘yeah’, don't put any punctuation
at the beginning of the unfinished sentence.
3. Use exclamation marks only in very strong sentences. Please use them sparingly.
4. Transcribe incomplete sentences with a full stop at the end.
5. Punctuation marks shall not be affected by the noise of audio cut.
For example, ‘She is so <NOISE> pretty’. This sentence is truncated by noise, so there
cannot be any punctuation behind the word ‘so’.
6. When you encounter modal words or catchphrases, such as 'and, but, so, then,
the, uh, um, yeah, okay' and so on, add a comma at the end according to
circumstances. If the meaning of the sentence is complete and modal words and
catchphrases are just added without foundation, there cannot be any punctuation
before or after the modal words.
For example, ‘What um do you like?’ There cannot be any punctuation before or after
the word ‘um.
7. Don't add punctuation to separate sentences the speaker makes mistakes, such
as: ‘She is my grand grandmother.’ There cannot be any punctuation before or after
the word ‘grand’.

C. Fallible Points
1. If the quiet noise is less than 0.5s, it can be put in the same segment as the
content of the speech, but it is not wrong to cut it separately. If the quiet noise is
more than 0.5s, it must be a new segment of its own.
2. If you can’t understand the words spoken by person in the audio, though the
sound quality is clear, please refer to the dictionary firstly, and mark <NOISE> if you
really cannot understand it.
3. When the main speaker speaks, another person interrupts. If the time points
overlap, there is no need to transcribe the interrupting words. If the time points do
not overlap, words spoken by the other speaker should be separately cut and
transcribed normally.
a. If the interpose is short, just like yeah, okay, and the speaker makes a coherent
logical statement before and after the interpose, then the speaker does not use a
period in the first half of the sentence.
For example, ‘My name is okay Mike’. Here, There cannot be any punctuation behind
‘My name is’. Capitalize the word ‘Okay’ and use closing punctuation.
RIGHT: My name is vad line Okay. Vad line Mike.
b. If the interpose clearly interrupts the speaker's logic, use concluding punctuation,
such as a period, question mark or exclamation mark, at the first half of the speaker's
sentence.
For example, ‘My name is. Vad line. What are you talking about?’
‘My name is’ is punctuated, because it's switching the speaker.

4. The Use of the Comma (,)


Ⅰ. When two independent sentences combine
We purchased some cheese, and we purchased some fruit.

Ⅱ. Between a series of words


I like reading books, listening to music, watching TV, and studying English.

Ⅲ. Separate the introduction from the rest of the sentence


As the day came to an end, the fire fighters put out the last spark.

Ⅳ. Divide ‘yes’ or ‘no’


No, thank you.

Ⅴ. Separate the questions at the end of the sentence


She is your sister, isn’t she?

Ⅵ.
Is that you, Mary?

Ⅶ. After the expression


Most certainly, you can borrow my pencil.

Ⅷ. When using a participle phrase clause


Walking slowly, I could see the beautiful flowers.

Ⅸ. Separate parts of the date


Tuesday, May 2, 2016, was when I graduated.

You might also like