You are on page 1of 11

Malaysian Transcription Guideline

I. Introduction of the Transcription Interface

1. Vad line is the dividing line. ( The purpose of adding dividing line is to
remove unnecessary and invalid content and to mark valid vocal parts.)

The white line in the interface is vad line which can be moved, added and
deleted. You need to adjust the vad line to a reasonable position and modify the
text in the content layer. The audio must be highly consistent with the text within
the boundary.

a. Add the vad line: Put the cursor in the appropriate position, and then click
[+vad] when the red dotted line appears after clicking.
b. Add two vad lines at the same time : A and B need to be add vad lines so that
drag the cursor to choose this part, then click <+vad>

c. Delete the vad line: drag the cursor and choose the part the needed to be
deleted, then click <-vad>

Ⅱ.Labeling Steps and Principles


1. Labeling Steps

2. Labeling Principles
a. You need to modify the text
strictly, adjust and add vad lines and
add corresponding labels according to the audio content and this guideline to make
each vad segment consistent words and sounds.The original transliteration results
shown by the machine are only for reference and need your modification.
b. A vad segment can only contain one person’s content of speech. The content
spoken by different people need to be divided.

Dividing Principles:
You should try to divide at punctuation to ensure the relative integrity of the
sentence.
a.restriction of character (a maximum of 120 characters)
b.vad duration (within 10 seconds of a single vad)
c.different speakers
d.part cannot be marked.
Invalid duration means the part that cannot be marked. You need to add the vad line
to determine the range of the area and choose the corresponding label for it.
When there is more than 1s of noise, mute, inaudible, multi-person talk and so on,
we also need to cut it out and label it accordingly.
The same goes for applause, laughter, non-Malay language, advertisement, songs with
voices that are clearly repeated in films or television, pure music without voices,
etc.

Ⅲ. Transcription Principles of Content Layer


1. Transcription Guideline
a. If there are English words in the audio, you should mark them according to the
English pronunciation in the audio. If the pronunciation in the audio is English,
please write English words. For example, if "shop" is pronounced "shop", it should
be labeled "shop". If the pronunciation in the audio is Malay, such as television
to televisyen, it should be labelled Televisyen.
b. If any language other than English and Malay is used in the sentence, such as
Arabic, Tamil, Chinese dialects (cantonese, hokkien, teochow, etc.), the sentence
should be written in Malay according to the pronunciation, such as assalamualaikum,
tapau, cincai, etc.
c. If the sentences in Mandarin, English and other non-Malay languages appear, they
need to be divided and marked as <oov>. If the other languages are just background
sound, that is, the sound is too small to be heard clearly, Malay can be labeled as
normal, and the content in other languages can not be marked <oov>.

2. Capital and Small Letter and Punctuation

a. Capital and Small Letter:

Capitalized the first letter of a sentence, proper noun, person's name, place name,

etc

b. Punctuation:

Supports only 5 punctuation marks of commas, periods, question marks, exclamation

marks, and hyphens (,.?!-) in the Malay or half-corner input state.

Do not use symbols other than these five punctuation marks.

Do not use Chinese punctuation marks.

You can only modify according to the meaning of the sentence, not according to the

paragraph. (For example, if a sentence is too long and divided into two lines, do

not capitalize the first letter of the second line, and please write the
punctuation at the end of the first line or no punctuation at all.)

3. Numbers and Special Symbols

a. Numbers should be phonetically transcribed into Malay words (mobile phone

number, identity card number, year, month, etc.). For example, 911 should be

transcribed as sembilan Satu satu.

b. Except for / 、and %, most of special symbols can be transcribed in the original

format, such as apm@gmail.com. /, % should be written in the corresponding Malay

language. For example, 3/5 should be written as tiga perlima and 90% should be

written as sembilan puluh peratus or Sembilan puluh persen according to the actual

sound of the audio.

4. Pronunciation Repeat

a. You need to transcribe the entire word as many times as it repeat according to

actual pronunciation.

b. If the sound is not the whole word but a single syllable of the word, you don’t

need to transcribe it.

5. Modal Particle

a. You need to transcribe complete and existing modal words, interjections and

onomatopoeia, such as eh, je, etc.

b. Very short (passing) cases are not transcribed.

6. URL

According to the pronunciation, you transcribe Malay if the audio is in Malay and

transcribe English if it is in English.

7. Spoken and Abbreviated Words

If the audio is spoken or abbreviated, it should be transcribed according to the

actual content of the audio. For example, ni does not need to be changed to ini,

and nak does not need to be marked as hendak.

8. Bad Data

If the whole audio contains only music, quiet noise, pure noise, laughter,

especially low inaudible sound, foreign language dubbing, non-Malay language, etc.,

then the whole data is invalid, you need to directly click the button "mark as bad

data".
9. Voice Overlap

a. Words spoken by different people should be divided into different lines.Two

voices should not be transcribed in one line. If it is one in front and one in

back, you need to divide it into two lines according to the pronunciation order

and transcribe it.

b. You need to ignore the secondary speaker and transcribe the main speaker when

the voice of the main speaker and the secondary speaker (less speech, little

impact) overlap and cannot be segmented.Besides, you need to mark <overlap> when

the voice of the main speaker and the secondary speaker (much speech, much

impact) overlap and cannot be segmented.

Ⅳ. Transcription Principles of Label Layer

1. Invalid Data
If more than 80% of the whole audio cannot be marked normally, such as sil, noise,
non-Malay, human voice that you cannot understand, etc., then the whole data will
be invalid, and you can directly click the button "mark as Invalid Data"

Count labels for invalid duration--The parts that cannot be transcribed need to be
segmented and labeled, and then the text layer automatically jumps to the label
without transcribing the text:

Label Scene operation

Separate it into a separate


segment and mark it as <NOISE>.
Noise more than 1
The boundary line should be cut
second;
accurately. For example, when
Non-human sounds such
cutting noise, the beginning and
<NOISE> as laughter /panting
end of the noise should be the
/screaming;
start and end points.
Pure music without
Do not cut the semantics during
lyrics
segmentation, and do not cut the
word in half.
Separately divided into a
Inaudible, indistinct,
<DEAF> section, marked as
unintelligible sentence
"Unclear<DEAF>"
Separately divided into a
sil Silence more than 1 second
section, marked as sil
1. If two people are speaking at the
same time, they are both unintelligible,
or both can hear clearly but affect each
other, they should be marked as
<OVERLAP>;
Two or more people speak at 2. Two people speak at the same time.
<OVERLAP>
the same time One of them speaks clearly and has a
louder voice, and the other speaks very
quietly, almost inaudible, so there is no
need to label, and the voice of the
other is clearly marked in the content
layer.
1. When languages other than
Malay and English appear, they
are directly marked as oov;
2. If two or more English words
appear consecutively, the
paragraph is also marked as oov;
3. After marking as oov, no need
oov Non-Malay(other language) to write any text in the content
layer;
4. If other languages only have
background sounds (the sound is
very small, almost inaudible),
Malay can be labeled normally,
and there is no need to label
them.
Ads that are very clear can be
marked in Malay, and unclear
<ad> advertisement
advertisements can be marked as
ad
Song with lyrics; 1. Choose music for vocal songs
<music>
foreign(Non-Malay)song 2. Unmanned song selection noise

The labels count toward the valid time


<continue> The sentence is 1、1. When a sentence with a
incomplete due to complete sentence is truncated
segmentation and must by noise/sil, the subsequent
be used for the same text segment needs to be marked
person’s speech with <continue> until the
sentence is relatively complete;
2、When the number of words in a
segment exceeds the upper limit
of 120 characters, if the
complete sentence is cut off due
to the word limit, the
subsequent text segment needs to
add the <continue> tag.
Mark the vocal part of the line
and add the <BGM> tag at the
Vocal + background
same time
music with lyrics
<BGM> (Only the background music with
appear at the same time
lyrics is still not marked,
directly label the music)

a:

1)If there is background noise, music, etc. in the section and it does not affect

to hear of normal people, please mark normally.

2)Try to keep the content of each VAD paragraph relatively complete when you

divide.

3)For the first 7 tags above, when the tag is finished, it means that this

paragraph has no valid Malay voice, so it is invalid, and there is no need to make

any editing and modification in the content layer.

b:

1) Such as noise, deaf, overlap, sil and so on appear in a piece of audio

successively, the whole part of the audio can be divided and marked as <DEAF>

If noise, sil, etc. occurs at the same time, the whole can be divided and

marked as <NOISE>;

If noise, deaf, sil and so on appear at the same time, they can be divided and

marked as <DEAF>.

2) <BGM>:
When you can hear the speaker clearly with background music with lyrics and
speakers, you need to mark the speaker's pronunciation and add the tag <BGM> to
this section.
Transcription method: mark the speaker's pronunciation as normal and select <BGM>
label for this section.
3) <CONTINUE>:
If more than one second of noise is causing the full sentence to be segmented, you
should add a <CONTINUE> tag to the next section until the sentence attached to the
<CONTINUE> tag is complete.
Transcription method:If a complete pronunciation content is interrupted by noise
for more than 1 second, it is necessary to mark the noise part with the
corresponding <NOISE> label, and mark the next conversation with the <CONTINUE>
label. The two paragraphs before and after the noise part are a complete
pronunciation content.

Ⅴ. Fallible Point (The following legend is in Arabic for


illustration only)
1. Q: Do I need to mark the words baiklah,ya, etc. when someone answers?
A: Yes. As long as it is legible and complete without overlap, you should transcribe the
pronunciation of the different speakers on a separate line.
2. Q: Can I transcribe in one segment when two or more people speak without significant
spacing and overlap in a paragraph?
A: No. In chronological order, from left to right, without overlap, you need to transcribe the text as
spoken and separate the speaker roles. If it really can not be divided, you should determine
whether to standard <OVERLAP> tag.
3. Q: What’s the difference of <DEAF> and <NOISE>?
A: <DEAF> is mostly used when you cannot hear or understand, whereas <DEAF> labels must be
human. <NOISE> is mostly pure noise, musical sound and no human voice. Special cases of
noise include laughter, crying, screaming, etc.
4. Q: Do I need to label it separately when the NOISE is connected to SIL in audio?
A: No. All you need to do is to merge them into a segment labeled noise.
5. Q:Do I need to divide continuous noise or continuous mute or continuous inaudible
passages?
A: No. All you need to do is to divide it into a complete vad segment and mark it.
The wrong sample:

Here you should combine 22/23/24/25 into one paragraph and tag it with <noise>
6. Labels should be selected in role layer 1, please do not write labels manually.
The wrong sample:
The right sample:

Ⅵ. Good Sample
The following figure provides examples of specific segmentation, labeling and content layer
labeling for your reference:

Ⅶ. Appendix (Introduction and Resolution of platform Interface name)


Analysis
Name in Interface
audio name ***.wav
You can view the shortcut buttons of the current task. For tasks
shortcuts
with the same labeling method, the shortcut keys are universal.
You can view multiple records saved in the history of the current
history
marked file.
The platform has an automatic save function, but it is
keep
recommended to manually “save” when doing tasks.
transcription Click [Marking Specification] in the upper right corner of the
guideline marking page to view it, and click again to put it away.
Click [Submit] to pop up a pop-up window prompting "The task has
been completed, confirm to submit?" Click OK to submit the task
successfully. Click Cancel to hide the pop-up window and return
submit to the labeling interface. The platform has an error correction
function. When there is an error message when submitting, please
modify it according to the error message and do not submit it
forcefully.
Click [File Sequence] to view the name of the received marked
file sequence
file, and click again to put it away.

transcription progress Indicates the progress of the bidder marking task.

When hovering the mouse, 4 types of all, good data, bad data and
all data unmarked are displayed. Click on different types, and the
corresponding marked data will be displayed on the page.

labeled as bad data Click [Mark as Bad Data]: mark this piece of data as bad data.

Click [Previous] and [Next] to enter the previous and next marked
Up/down
files.
The recovery time is 120h, and the recovery time of the return
recycle time
mission is 120h.
Select the areas on the left and right sides of the boundary
merge vad segment line, right-click to merge the segments, or click "-vad", or use
the keyboard shortcut C.
Place the indicator line where you need to add a segment, click
add vad segment to add a vad segment, a vad segment will be added after the
indicator line, or use the keyboard shortcut S.

play/pause The shortcut key is the space bar of the keyboard.

font size
adjustment(A) After clicking, you can select the appropriate font size.

play in a ①By default, the platform will pause after playing a vad segment
loop/continuously by default;
②Loop playback can choose to loop once, twice or three times,
and the playback will stop after the current vad loop is
completed; the selected loop times are applicable to the entire
audio;
③You can also choose to play continuously (-). After one vad is
played, the next vad segment will be played automatically until
the full length of audio is played.
Annotate according to the audio, modify the original text in the
content layer
content layer, and do not support line breaks.
Click the plus and minus signs to zoom the audio field of view,
audioscale
the maximum ratio is 50 times.
Role attributes, gender attributes, or both are different for
role layer
different tasks and are used to add relevant tags.

You might also like