Professional Documents
Culture Documents
CANNOT transcribe:
If the main speaker's voice was drowned out by the background sound, or can not be heard
clearly, invalid.
If two speakers speak together and both of them can be heard clearly, like interruption or
argument, invalid.
6. Before and after each segment, 0.2-0.5 seconds of white space should be reserved (all
sounds except obvious human voices, including mute).
7. Sound changer, TV broadcast, navigation and other sound normal cut and label.
8. Narration videos can be cut and transcribed normally.
9. Talk show, live character performance (original) : cut by character, character A, character B...
, the narrator. The content of the character should meet the minimum length limit of cut
(≥2 words). If not, refer to "2.4" for handling.
V. Invalid data
1. If the whole audio is an unintelligible dialect/no one speaks/continuous single word, such
as “ya ya ya” and “hehehehehe” , then invalid.
2. Mute or live singing contains severe slurring of singing/speaking (including the words of
infants who have just learned to speak and cannot articulate the words clearly), heavy
accent/ blurred pronunciation (not sure if the words are written correctly), lost frames,
which make the segment can not be understand, invalid.
3. Through second use, suspected of having been used many times of audio or video,
invalid. Don't label the case as
there have been many times:
like lip-synching, BGM song lyrics, choose can not label
4. Part of the suspected of having been used many times:
For the co-shot video, it is suspected that the used part has been for many times, do
not label. The part of the photographer's real voice needs to be properly label.
5. The content of the segment is less than two words (<2) , the segment is invalid.