New Zealand English Transcription Guidelines 1104

New Zealand English Transcription rules
Update 11/4
How to deal with abbreviation
Capitalization for abbreviation requires context to decide:

1. if the meaning is an abbreviation of phrase or abbreviation of a proper noun,
should write them with all capitalized letters
2. for lack of context information, we can accept individual lowercase transliteration
such as a b c/ n b a (as in speaker spelled it out)
Example:
1. ABC News channel just / IDK (I don't know) / OMG (oh my god)
2. a b c one two three
Update 09/28
3.3 Text transcribes.
t) If the voice (must be speech rather than music) is from TV, Siri and Google translation etc., if
you can hear clearly, transcribe it. If it is unclear, please discard it.
Catalogue
1. Introduction of Platform Manual .......................................................................................................................................3
explanation ....................................................................................................................................................................3
Keyboard shortcut: ........................................................................................................................................................4
2. Workflow ...........................................................................................................................................................................4
3. Annotation Guidelines .......................................................................................................................................................4
3.1 Discard： ................................................................................................................................................................4
3.2 Segment： .......................................................................................................................................................................5
a) Do not think about the completeness of the sentence while cutting the audio. .........................................5
b) Part of gray area is unclear. .......................................................................................................................5
c) Part of gray area is overlapped (2 or more speakers talking simultaneously) ...........................................5
d) Part of gray area is music, melodies, songs, animal, or natural sounds:....................................................5
e) Pause/noise at the beginning, middle and end of the audio clip. ...............................................................6
f) Modal words ..................................................................................................................................................6

3.3 Text transcribes........................................................................................................................................................6
a) Spaces are needed between words. Never wrap text. ........................................................................................6
b) The final intercepted audio must contain at least two words (≥2).....................................................................6
c) Arabic numbers .................................................................................................................................................6
d) Special characters..............................................................................................................................................6
e) Punctuation........................................................................................................................................................6
f) Repeated words and sentences...........................................................................................................................6
g) Abbreviation .....................................................................................................................................................6
h) Capitalization ....................................................................................................................................................7
i) Informal words (Trending words that are not found in dictionary)....................................................................7
j) Words with non-standard pronunciation ............................................................................................................7
k) Half pronounced words .....................................................................................................................................7
l) homophone.........................................................................................................................................................7
m) Simplified form/ spoken language ...................................................................................................................8
n) Bad language, abusing words, accelerated audio ..............................................................................................8
o) Poem .................................................................................................................................................................8
p) Double spaces between words ..........................................................................................................................8
q) Spelled word .....................................................................................................................................................8
r) Words with non-standard pronunciation ............................................................................................................8
s) Accelerated audio ..............................................................................................................................................8
t) Voice from TV, Siri, Google translation etc.. .....................................................................................................8
3.4 Special Words ..........................................................................................................................................................8

1. Introduction of Platform Manual
Cut a section of clear human speech from the audio and transcribe the audio into text.
explanation
• gray part: a piece of intercepted audio by default, we can ONLY revise the gray part.
• blue part: your segmented result, also you shall transcribe it into text.
• white part: the audio before and after gray part, no need to transcribe or/and segment; but you could also listen
to this part just for your reference.
• Audio classes:
￮ Speech - clear human voice
￮ Discard - audio does not meet ASR speech requirements.
• Text box: where text is entered.
• Video: this not use in this project
Keyboard shortcut:
1 - continue to play where you left off.
2 - pause.
3 - play the entire audio.
5 - play default audio (gray area)
a - play cut (current cut-blue area)
s - start cut.
e - end cut.
2. Workflow
• Step 1. Listen to the default audio. (gray part)
• Step 2. Select audio category (speech or discard)
• Step 3-1. If you choose ‘discard’ classification, submitting this task directly. No need to change the text below.
• Step 3-2. If you choose ‘speech’ classification, you need to determine whether to intercept the audio or not.
And then transcribe the audio.
3. Annotation Guidelines
3.1 Discard：
• the entire audio is in not English.
• the entire audio is unclear or non-audible speech.
• the entire audio is songs or non-human speech, which includes melodies, animals' sounds, and natural
sounds.
• Only one English word should be discarded. (Compound word is considered as one word
like ‘fifty-five’)
• the entire audio contains only modal words.
Note: If you select “discard”, no need to transcribe, just click “submit” and go to next audio.
3.2 Segment：
a) Do not think about the completeness of the sentence, while cutting the audio.
b) Part of gray area is unclear.
• A segment should always start with clear words, if it’s unclear in the front part or the back part, you
have to intercept it.
• If it’s unclear in the middle of a speech, please cut either side.
For example: “Clear speech1 + unclear+ Clear speech2” -- either “Clear speech1 ” or “Clear speech2”
is accepted for a segment. Do not transcribe both.
But note: If the noise (unclear part) affects the content, intercept it, keep the rest and transcribe. If the noise
does not affect the content, ignore it and transcribe the entire audio.
c) Part of gray area is overlapped (2 or more speakers talking simultaneously)

• Discard: entire audio is overlapping, can’t hear clearly.
• Cut Out: talk about different things simultaneously, but we CAN’T tell the content— please cut this
part out and keep the rest and clear part to transcribe.
• Keep and transcribe:
Talk about the same words simultaneously and the words sound clear, you need to keep this
part in and transcribe it. not talk at the same time, the audio should be regarded as a normal speech case
and transcribe it.
There is one main voice in a group conversation, the others are low or fuzzy, and the sound
articulation of the main speaker's speech does not be affected by others. So, transcribe the main
one, and regard others as background sound or noise.
d) Part of gray area is music, melodies, songs, animal, or natural sounds:
• Discard:
If the entire audio is a song or non-human voice, like music, melodies, the sound of animal and
nature and so on – discard this audio.
• Cut Out:
If the background sound is a song with lyrics, cut this part out and reserve the clear human speech
part or discard the entire audio if it's hard to cut the audio.
***BUT, if the background sound does not affect the clarity of the speaker's speech, transcribe the
speaker's speech, and ignore the background sound.
• Keep and Transcribe:
If the background sound is melodies without lyrics and human speech is clear keep it and
transcribe the entire audio.
if the speaker is singing a song without background melodies – transcribe it.
If the main voice - singing is clear, but the bgm without lyrics – Transcribe
If the main voice - singing is clear, there's bgm with lyrics - discard or cut out overlapped part
***BUT, if the audio has only played songs from player, discard it.
e) Pause/noise at the beginning, middle and end of the audio clip.
• If the noise affects the content, intercept it, keep English, and transcribe. If the noise does not affect
the content, ignore it, and transcribe the entire audio.
※For example: “speech 1 + pause/noise (does not affect the content) + speech 2”.
Transcribe speech1 + speech 2.
※For example: speech1 + pause/noise (affect the content) + speech2
--- either “speech1” or “speech2” is accepted for a segment. Do not transcribe both.
f) Modal words
• In the beginning or end of the intercepted audio

The selected speech should start with (and end with) up to 2 modal words.
※Example: There is a paragraph laughing (around 10 "ha") at the beginning of speech, it's
enough to keep a fraction of this part in audio (around 2 "ha ha" in audio ).
Uncountable modal words ---- cut it out, only transcribe the English part.
• In the middle of intercepted audio:

If you can clearly count the number of the modal words, you should transcribe.
For repeat modal words, write down the same number of modal words in the audio.
※ eg. 3 "ha" in the audio, you need to write "ha ha ha" in the text.
Uncountable modal words ---- do not transcribe and intercept it.
※For example: speech1 + uncountable modal words + speech2

Either “speech1 ” or “speech2” is accepted for a segment. Do not transcribe both.
For modal words that in words collection, we can transcribe it if the number of it can be counted.
3.3 Text transcribes.

a) Spaces are needed between words. Never wrap text.
b) The final intercepted audio must contain at least two words (≥2).
Must contain at least 1 meaningful word. (non-modal word).
c) Arabic numbers should be transcribed into the word in English. E.g. 1 -> one.
d) Special characters should be transcribed into the word in English.
Eg: @ is not allowed, should be transcribed into the word in English.

e) No punctuation in text except hyphen (-) or apostrophe (‘) that is appropriately used for word spelling.
f) Repeated words and sentences must be transcribed strictly according to the number of times they get repeated.
g) Abbreviation
• Abbreviations for special terms/name etc. (eg. ANTV, SCTV, I-LAND) need to be CAPITALIZED.
• Abbreviations for phrase like gws, otw, bwt,imo, lol, etc (get well soon, on the way, by the way, in my
opinion, laugh out loud, etc), these abbreviations should be written in in lowercase."
h) Capitalization
• Do not capitalize the first letter of text except for proper nouns
• Proper nouns should be capitalized accordingly. Location (city name, street name), person’s name, brand,
zodiac etc.
e.g. New York, Istanbul, Turkey, KFC, NBA
• FB and IG are abbreviations of some proper nouns, just type them as they are.
i) Informal words (Trending words that are not found in dictionary)
• if the voice is clear but not standard (example: baby voice, stammer voice), you should
transcribe it in standard words.
• Non-existing words are not acceptable. Only standard words and commonly used informal words
can be accepted in transcription.
• For commonly used informal words, we advise you to write informal words, transcribe as you
hear. but standard forms are also correct and acceptable. But if you are not sure whether the
informal forms are correct or acceptable, you can write standard words.
j) Words with non-standard pronunciation:
• If pronunciation is not standard but able to tell the correct word, transcribe the correct word.
• If unable to tell, treat it as unclear word and intercept.

k) Half pronounced words:
• There is a half-pronounced word which is not an individual word, we should cut it off.
Eg: “I wanted you to be the Ame”, should cut “Ame ” and transcribe “I wanted you to be the”
• There is a half-pronounced word which is an individual word, we can keep and transcribe it.
Eg: "I want to go to the super" , should transcribe "I want to go to the super" the word
‘supermarket’ is half spoke, but we can hear “super”. “super” is individual word. The right
transcription of this case is "I want to go to the super"
l) homophone:
• Listen to the following default cut to confirm what the whole sentence is, write down the correct word by
context.
※Eg1. The current cut is "The hole (or other word that sounds like /həʊl/ but you cannot
confirm) ", but you can know the sentence is "The whole town disagreed with the
mayor." from the following default cut. So the right transcription of this case is "the
whole".
• If there are multiple homophones whose meaning conforms to the meaning of the default cut sentence,
you can write any word.
※Eg2. The default cut is "where is my deer/dear." Both words match the meaning of the sentence, you
can write anyone.
m) Simplified form/ spoken language
• Transcribing the corresponding form that speaker says. Transcribe what you hear, do not correct grammar
mistakes.
※Eg1. “I'm gonna do some sports”. According to the audio, must be written as "gonna", cannot be
written as "going to".
n) Bad language, abusing words, accelerated audio:
• Both the abusing words and bad language need to be transcribed.

• if the accelerated audio can be heard clearly, just transcribe it.
o) Poem should be transcribed normally.
p) Double spaces between words are ok.
q) Spelled word
For the word spelled between each letter it must be given a space. And please use lower case.
Eg: “a w e s o m e”
r) Words with non-standard pronunciation
If pronunciation is not standard but able to tell the correct word, transcribe the correct word, If unable to
tell, treat it as unclear word an intercept.
s) If the accelerated audio can be heard clearly, just transcribe it.
t) If the voice (must be speech rather than music) is from TV, Siri and Google translation etc., if you can hear clearly,
transcribe it.
If it is unclear, please discard it.
3.4 Special Words

Please refer to “term alignment” spreadsheet

New Zealand English Transcription Guidelines 1104

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

New Zealand English Transcription Guidelines 1104

Uploaded by

Copyright:

Available Formats

New Zealand English Transcription rules

How to deal with abbreviation

Capitalization for abbreviation requires context to decide:

Keyboard shortcut: ........................................................................................................................................................4

3. Annotation Guidelines .......................................................................................................................................................4

3.1 Discard： ................................................................................................................................................................4

3.2 Segment： .......................................................................................................................................................................5

b) Part of gray area is unclear. .......................................................................................................................5

c) Part of gray area is overlapped (2 or more speakers talking simultaneously) ...........................................5

d) Part of gray area is music, melodies, songs, animal, or natural sounds:....................................................5

f) Modal words ..................................................................................................................................................6

a) Spaces are needed between words. Never wrap text. ........................................................................................6

c) Arabic numbers .................................................................................................................................................6

f) Repeated words and sentences...........................................................................................................................6

i) Informal words (Trending words that are not found in dictionary)....................................................................7

j) Words with non-standard pronunciation ............................................................................................................7

k) Half pronounced words .....................................................................................................................................7

m) Simplified form/ spoken language ...................................................................................................................8

n) Bad language, abusing words, accelerated audio ..............................................................................................8

p) Double spaces between words ..........................................................................................................................8

q) Spelled word .....................................................................................................................................................8

r) Words with non-standard pronunciation ............................................................................................................8

s) Accelerated audio ..............................................................................................................................................8

t) Voice from TV, Siri, Google translation etc.. .....................................................................................................8

3.4 Special Words ..........................................................................................................................................................8

to this part just for your reference.

￮ Speech - clear human voice

￮ Discard - audio does not meet ASR speech requirements.

• Text box: where text is entered.

• Video: this not use in this project

3 - play the entire audio.

5 - play default audio (gray area)

a - play cut (current cut-blue area)

• Step 2. Select audio category (speech or discard)

And then transcribe the audio.

• the entire audio is unclear or non-audible speech.

b) Part of gray area is unclear.

c) Part of gray area is overlapped (2 or more speakers talking simultaneously)

d) Part of gray area is music, melodies, songs, animal, or natural sounds:

e) Pause/noise at the beginning, middle and end of the audio clip.

• In the beginning or end of the intercepted audio

• In the middle of intercepted audio:

※For example: speech1 + uncountable modal words + speech2

3.3 Text transcribes.

Must contain at least 1 meaningful word. (non-modal word).

d) Special characters should be transcribed into the word in English.

Eg: @ is not allowed, should be transcribed into the word in English.

i) Informal words (Trending words that are not found in dictionary)

• If unable to tell, treat it as unclear word and intercept.

• Both the abusing words and bad language need to be transcribed.

p) Double spaces between words are ok.

3.4 Special Words

You might also like