Malaysian Transcription Guideline

Malaysian Transcription Guideline
I. Introduction of the Transcription Interface
1. Vad line is the dividing line. （ The purpose of adding dividing line is to
remove unnecessary and invalid content and to mark valid vocal parts.）
The white line in the interface is vad line which can be moved, added and
deleted. You need to adjust the vad line to a reasonable position and modify the
text in the content layer. The audio must be highly consistent with the text within
the boundary.
a. Add the vad line: Put the cursor in the appropriate position, and then click
[+vad] when the red dotted line appears after clicking.
b. Add two vad lines at the same time : A and B need to be add vad lines so that
drag the cursor to choose this part, then click <+vad>
c. Delete the vad line: drag the cursor and choose the part the needed to be
deleted, then click <-vad>
Ⅱ.Labeling Steps and Principles

1. Labeling Steps
2. Labeling Principles
a. You need to modify the text
strictly, adjust and add vad lines and
add corresponding labels according to the audio content and this guideline to make
each vad segment consistent words and sounds.The original transliteration results
shown by the machine are only for reference and need your modification.
b. A vad segment can only contain one person’s content of speech. The content
spoken by different people need to be divided.
Dividing Principles:
You should try to divide at punctuation to ensure the relative integrity of the
sentence.
a.restriction of character (a maximum of 120 characters)
b.vad duration (within 10 seconds of a single vad)
c.different speakers
d.part cannot be marked.
Invalid duration means the part that cannot be marked. You need to add the vad line
to determine the range of the area and choose the corresponding label for it.
When there is more than 1s of noise, mute, inaudible, multi-person talk and so on,
we also need to cut it out and label it accordingly.
The same goes for applause, laughter, non-Malay language, advertisement, songs with
voices that are clearly repeated in films or television, pure music without voices,
etc.
Ⅲ. Transcription Principles of Content Layer

1. Transcription Guideline
a. If there are English words in the audio, you should mark them according to the
English pronunciation in the audio. If the pronunciation in the audio is English,
please write English words. For example, if "shop" is pronounced "shop", it should
be labeled "shop". If the pronunciation in the audio is Malay, such as television
to televisyen, it should be labelled Televisyen.
b. If any language other than English and Malay is used in the sentence, such as
Arabic, Tamil, Chinese dialects (cantonese, hokkien, teochow, etc.), the sentence
should be written in Malay according to the pronunciation, such as assalamualaikum,
tapau, cincai, etc.
c. If the sentences in Mandarin, English and other non-Malay languages appear, they
need to be divided and marked as <oov>. If the other languages are just background
sound, that is, the sound is too small to be heard clearly, Malay can be labeled as
normal, and the content in other languages can not be marked <oov>.
2. Capital and Small Letter and Punctuation
a. Capital and Small Letter:
Capitalized the first letter of a sentence, proper noun, person's name, place name,
etc
b. Punctuation:
Supports only 5 punctuation marks of commas, periods, question marks, exclamation
marks, and hyphens (,.?!-) in the Malay or half-corner input state.
Do not use symbols other than these five punctuation marks.
Do not use Chinese punctuation marks.
You can only modify according to the meaning of the sentence, not according to the
paragraph. (For example, if a sentence is too long and divided into two lines, do
not capitalize the first letter of the second line, and please write the
punctuation at the end of the first line or no punctuation at all.)
3. Numbers and Special Symbols
a. Numbers should be phonetically transcribed into Malay words (mobile phone
number, identity card number, year, month, etc.). For example, 911 should be
transcribed as sembilan Satu satu.
b. Except for / 、and %, most of special symbols can be transcribed in the original
format, such as apm@gmail.com. /, % should be written in the corresponding Malay
language. For example, 3/5 should be written as tiga perlima and 90% should be
written as sembilan puluh peratus or Sembilan puluh persen according to the actual
sound of the audio.
4. Pronunciation Repeat
a. You need to transcribe the entire word as many times as it repeat according to
actual pronunciation.
b. If the sound is not the whole word but a single syllable of the word, you don’t
need to transcribe it.
5. Modal Particle
a. You need to transcribe complete and existing modal words, interjections and
onomatopoeia, such as eh, je, etc.
b. Very short (passing) cases are not transcribed.
6. URL
According to the pronunciation, you transcribe Malay if the audio is in Malay and
transcribe English if it is in English.
7. Spoken and Abbreviated Words
If the audio is spoken or abbreviated, it should be transcribed according to the
actual content of the audio. For example, ni does not need to be changed to ini,
and nak does not need to be marked as hendak.
8. Bad Data
If the whole audio contains only music, quiet noise, pure noise, laughter,
especially low inaudible sound, foreign language dubbing, non-Malay language, etc.,
then the whole data is invalid, you need to directly click the button "mark as bad
data".
9. Voice Overlap
a. Words spoken by different people should be divided into different lines.Two
voices should not be transcribed in one line. If it is one in front and one in
back, you need to divide it into two lines according to the pronunciation order
and transcribe it.
b. You need to ignore the secondary speaker and transcribe the main speaker when
the voice of the main speaker and the secondary speaker (less speech, little
impact) overlap and cannot be segmented.Besides, you need to mark <overlap> when
the voice of the main speaker and the secondary speaker (much speech, much
impact) overlap and cannot be segmented.
Ⅳ. Transcription Principles of Label Layer
1. Invalid Data
If more than 80% of the whole audio cannot be marked normally, such as sil, noise,
non-Malay, human voice that you cannot understand, etc., then the whole data will
be invalid, and you can directly click the button "mark as Invalid Data"
Count labels for invalid duration--The parts that cannot be transcribed need to be
segmented and labeled, and then the text layer automatically jumps to the label
without transcribing the text：
Label Scene operation
Separate it into a separate

segment and mark it as <NOISE>.
Noise more than 1
The boundary line should be cut
second；
accurately. For example, when
Non-human sounds such
cutting noise, the beginning and
<NOISE> as laughter /panting
end of the noise should be the
/screaming；
start and end points.
Pure music without
Do not cut the semantics during
lyrics
segmentation, and do not cut the
word in half.
Separately divided into a
Inaudible, indistinct,
<DEAF> section, marked as
unintelligible sentence
"Unclear<DEAF>"
Separately divided into a
sil Silence more than 1 second
section, marked as sil
1. If two people are speaking at the
same time, they are both unintelligible,
or both can hear clearly but affect each
other, they should be marked as
<OVERLAP>;
Two or more people speak at 2. Two people speak at the same time.
<OVERLAP>
the same time One of them speaks clearly and has a
louder voice, and the other speaks very
quietly, almost inaudible, so there is no
need to label, and the voice of the
other is clearly marked in the content
layer.
1. When languages other than
Malay and English appear, they
are directly marked as oov;
2. If two or more English words
appear consecutively, the
paragraph is also marked as oov;
3. After marking as oov, no need
oov Non-Malay(other language) to write any text in the content
layer;
4. If other languages only have
background sounds (the sound is
very small, almost inaudible),
Malay can be labeled normally,
and there is no need to label
them.
Ads that are very clear can be
marked in Malay, and unclear
<ad> advertisement
advertisements can be marked as
ad
Song with lyrics； 1. Choose music for vocal songs
<music>
foreign（Non-Malay）song 2. Unmanned song selection noise
The labels count toward the valid time

<continue> The sentence is 1、1. When a sentence with a
incomplete due to complete sentence is truncated
segmentation and must by noise/sil, the subsequent
be used for the same text segment needs to be marked
person’s speech with <continue> until the
sentence is relatively complete;
2、When the number of words in a
segment exceeds the upper limit
of 120 characters, if the
complete sentence is cut off due
to the word limit, the
subsequent text segment needs to
add the <continue> tag.
Mark the vocal part of the line
and add the <BGM> tag at the
Vocal + background
same time
music with lyrics
<BGM> (Only the background music with
appear at the same time
lyrics is still not marked,
directly label the music)
a：
1）If there is background noise, music, etc. in the section and it does not affect
to hear of normal people, please mark normally.
2）Try to keep the content of each VAD paragraph relatively complete when you
divide.
3）For the first 7 tags above, when the tag is finished, it means that this
paragraph has no valid Malay voice, so it is invalid, and there is no need to make
any editing and modification in the content layer.
b:
1) Such as noise, deaf, overlap, sil and so on appear in a piece of audio
successively, the whole part of the audio can be divided and marked as <DEAF>
If noise, sil, etc. occurs at the same time, the whole can be divided and
marked as <NOISE>;
If noise, deaf, sil and so on appear at the same time, they can be divided and
marked as <DEAF>.
2) <BGM>：
When you can hear the speaker clearly with background music with lyrics and
speakers, you need to mark the speaker's pronunciation and add the tag <BGM> to
this section.
Transcription method: mark the speaker's pronunciation as normal and select <BGM>
label for this section.
3) <CONTINUE>:
If more than one second of noise is causing the full sentence to be segmented, you
should add a <CONTINUE> tag to the next section until the sentence attached to the
<CONTINUE> tag is complete.
Transcription method:If a complete pronunciation content is interrupted by noise
for more than 1 second, it is necessary to mark the noise part with the
corresponding <NOISE> label, and mark the next conversation with the <CONTINUE>
label. The two paragraphs before and after the noise part are a complete
pronunciation content.
Ⅴ. Fallible Point (The following legend is in Arabic for

illustration only)
1. Q: Do I need to mark the words baiklah,ya, etc. when someone answers?
A: Yes. As long as it is legible and complete without overlap, you should transcribe the
pronunciation of the different speakers on a separate line.
2. Q: Can I transcribe in one segment when two or more people speak without significant
spacing and overlap in a paragraph?
A: No. In chronological order, from left to right, without overlap, you need to transcribe the text as
spoken and separate the speaker roles. If it really can not be divided, you should determine
whether to standard <OVERLAP> tag.
3. Q: What’s the difference of <DEAF> and <NOISE>?
A: <DEAF> is mostly used when you cannot hear or understand, whereas <DEAF> labels must be
human. <NOISE> is mostly pure noise, musical sound and no human voice. Special cases of
noise include laughter, crying, screaming, etc.
4. Q: Do I need to label it separately when the NOISE is connected to SIL in audio?
A: No. All you need to do is to merge them into a segment labeled noise.
5. Q:Do I need to divide continuous noise or continuous mute or continuous inaudible
passages?
A: No. All you need to do is to divide it into a complete vad segment and mark it.
The wrong sample:
Here you should combine 22/23/24/25 into one paragraph and tag it with <noise>
6. Labels should be selected in role layer 1, please do not write labels manually.
The wrong sample:
The right sample:
Ⅵ. Good Sample
The following figure provides examples of specific segmentation, labeling and content layer
labeling for your reference:
Ⅶ. Appendix (Introduction and Resolution of platform Interface name)

Analysis
Name in Interface
audio name ***.wav
You can view the shortcut buttons of the current task. For tasks
shortcuts
with the same labeling method, the shortcut keys are universal.
You can view multiple records saved in the history of the current
history
marked file.
The platform has an automatic save function, but it is
keep
recommended to manually “save” when doing tasks.
transcription Click [Marking Specification] in the upper right corner of the
guideline marking page to view it, and click again to put it away.
Click [Submit] to pop up a pop-up window prompting "The task has
been completed, confirm to submit?" Click OK to submit the task
successfully. Click Cancel to hide the pop-up window and return
submit to the labeling interface. The platform has an error correction
function. When there is an error message when submitting, please
modify it according to the error message and do not submit it
forcefully.
Click [File Sequence] to view the name of the received marked
file sequence
file, and click again to put it away.
transcription progress Indicates the progress of the bidder marking task.
When hovering the mouse, 4 types of all, good data, bad data and
all data unmarked are displayed. Click on different types, and the
corresponding marked data will be displayed on the page.
labeled as bad data Click [Mark as Bad Data]: mark this piece of data as bad data.
Click [Previous] and [Next] to enter the previous and next marked
Up/down
files.
The recovery time is 120h, and the recovery time of the return
recycle time
mission is 120h.
Select the areas on the left and right sides of the boundary
merge vad segment line, right-click to merge the segments, or click "-vad", or use
the keyboard shortcut C.
Place the indicator line where you need to add a segment, click
add vad segment to add a vad segment, a vad segment will be added after the
indicator line, or use the keyboard shortcut S.
play/pause The shortcut key is the space bar of the keyboard.
font size
adjustment（A） After clicking, you can select the appropriate font size.
play in a ①By default, the platform will pause after playing a vad segment
loop/continuously by default;
②Loop playback can choose to loop once, twice or three times,
and the playback will stop after the current vad loop is
completed; the selected loop times are applicable to the entire
audio;
③You can also choose to play continuously (-). After one vad is
played, the next vad segment will be played automatically until
the full length of audio is played.
Annotate according to the audio, modify the original text in the
content layer
content layer, and do not support line breaks.
Click the plus and minus signs to zoom the audio field of view,
audioscale
the maximum ratio is 50 times.
Role attributes, gender attributes, or both are different for
role layer
different tasks and are used to add relevant tags.

Malaysian Transcription Guideline

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Malaysian Transcription Guideline

Uploaded by

Copyright:

Available Formats

Malaysian Transcription Guideline

I. Introduction of the Transcription Interface

Ⅱ.Labeling Steps and Principles

Ⅲ. Transcription Principles of Content Layer

2. Capital and Small Letter and Punctuation

a. Capital and Small Letter:

Supports only 5 punctuation marks of commas, periods, question marks, exclamation

marks, and hyphens (,.?!-) in the Malay or half-corner input state.

Do not use symbols other than these five punctuation marks.

Do not use Chinese punctuation marks.

3. Numbers and Special Symbols

a. Numbers should be phonetically transcribed into Malay words (mobile phone

transcribed as sembilan Satu satu.

format, such as apm@gmail.com. /, % should be written in the corresponding Malay

sound of the audio.

need to transcribe it.

onomatopoeia, such as eh, je, etc.

b. Very short (passing) cases are not transcribed.

transcribe English if it is in English.

7. Spoken and Abbreviated Words

If the audio is spoken or abbreviated, it should be transcribed according to the

and nak does not need to be marked as hendak.

a. Words spoken by different people should be divided into different lines.Two

and transcribe it.

impact) overlap and cannot be segmented.

Ⅳ. Transcription Principles of Label Layer

Label Scene operation

Separate it into a separate

The labels count toward the valid time

to hear of normal people, please mark normally.

any editing and modification in the content layer.

1) Such as noise, deaf, overlap, sil and so on appear in a piece of audio

Ⅴ. Fallible Point (The following legend is in Arabic for

Ⅶ. Appendix (Introduction and Resolution of platform Interface name)

transcription progress Indicates the progress of the bidder marking task.

play/pause The shortcut key is the space bar of the keyboard.

You might also like