Professional Documents
Culture Documents
game 外语视频标注规范
game 外语视频标注规范
4. Content transcription
The tagger needs to transcript the content according to the audio he hears.
The transcription must be exactly the same as the speech. The transcription
content can not be more words, less words, wrong words. The general guidelines
are as follows:
1) Uppercase and lowercase letters: If the word usually starts with a
capital letter, use normal writing methods such as China, Microsoft.
2) Numbers: If there are numbers in the text, they cannot be translated
directly to Arabic numerals, but must be translated into the text of
the language.
原文 转写
NFC NFC
6. Modal prticle:
Modal words should be accurately translated according to pronunciation and
semantics (pure laughter does not need to be marked; However, if the modal
particle contributes to the meaning of the context, it must be marked. For
example: "Would you like to have dinner later?" "um." The "um" here is a
response to the above, it is semantic; If the modal particle does not
contribute to the meaning of the context, it does not need to be marked, anyway
it is not wrong to mark it, such as mindless humming.)
7. Other
Swear words need to be translated normally, do not replace it with
letters;
Internet hot words, common Internet words should be translated
according to common usage;
If there are repeated words in the speech, translate all of them.
If the tagger hears clearly and can determine the pronunciation, but
is not sure of the meaning, such as a common name, he can choose a
homophone instead. Taggers need to make sure that the text matches
the pronunciation. If the meaning of the context is clear, the tagger
should select the word that matches the pronunciation and the meaning
of the sentence;
If a word is incomplete, the tagger should add a “-” after it and a
space between the following words, for example: I want to go to s-
school. Note that the end of the sentence must be a complete word. If
the unfinished word is at the end of the sentence, it cannot be
intercepted in the sentence.
8. Special label:
If any of the following situations occur, the tagger need to add special
labels.
(Label use must be reasonable: avoid missing pairs of labels, inconsistent
capitalization, and unpaired parentheses.)
Data Noise Special Description Role Transcripti
validation labels labeli on
ng
Valid data No noise no Transcribe O1 or Today I
the content O2… went to
heard eat.
according to
the standard
rule
[N] If a sentence Today I
contains went to
noise, it is eat.[N]
necessary to
mark [N] at
the end of
the sentence,
but it is not
necessary to
distinguish
the type of
noise.
[HM] Rapping and 一人我饮酒
singing 醉[HM]
should be
marked with
[HM] at the
end of the
sentence.
[OVERLAP/][/ If the speech Today I
OVERLAP] overlaps, but went to
one side is [OVERLAP/]
particularly eat
clear, the [/OVERLAP]
tagger
transcribes
only that
part. The
role labels
this speaker.
The other
affected part
of the text
marks the
label.
[OFFENSIVE/] Text that is [/
[/OFFENSIVE] affected by OFFENSIVE]
sensitive You're
content, blind.
including [/OFFENSIVE
uncomfortable ] I just
content made a big
related to move
politics,
opposition,
religion and
race,
pornography,
etc., is
marked with
this label
Invalid Recorder's [IVS] The tagger N [IVS]
data invalid should use
speech this label to
segment mark noise
segments
longer than
0.5 seconds.
For example,
the voice
overlaps, and
the voice
volume is
very similar;
Loss frame;
Speech
truncation;
Speech
echoes;
Not normal
speech tone:
such as
singing,
pinching the
voice to
speak, etc.;
Non-target
language;
There are
certain words
in the speech
segment that
are inaudible
or cannot be
transcribed
because of
noise.
Non- [OIVS] The tagger N [OIVS]
recorder’s needs to use
invalid this label to
speech mark noise
segment segments
longer than
0.5 seconds.
For example:
Television
voice;
Program
broadcast
Narration
Advertisement
;
Music with a
human voice;
Etc.