You are on page 1of 7

Australian English Transcription rules

Update:8/30/2021

3.3.t If the voice is from TV, Siri, Google translation etc., if you can hear the video clearly, transcribe it. If it is
unclear, please discard it.

Table of Content

1. Introduction of Platform Manual .....................................................................................................................................


3 explanation ...........................................................................................................................................................................
3 Keyboard shortcut: ...............................................................................................................................................................
3

2. Workflow .........................................................................................................................................................................
4
3. Annotation Guidelines .....................................................................................................................................................
4
3.1 Discard: ...................................................................................................................................................................... 4

3.2 Segment: ......................................................................................................................................................................

4 a) Do not think about the completeness of the sentence, while cutting the

audio ................................................................ 4

b) Part of gray area is unclear ............................................................................................................................................. 4

c) Part of gray area is overlapped (2 or more speakers talking simultaneously) .................................................................


4

d) Part of gray area is music, melodies, songs, animal, or natural sounds: ......................................................................... 5

e) Pause/noise at the beginning, middle and end of the audio clip...................................................................................... 5

f) Modal words ....................................................................................................................................................................


5

3.3 Text transcribes .............................................................................................................................................................. 6

a) Spaces are needed between words. Never wrap textb) The final intercepted audio must contain at least two words
(≥2). Must contain at least 1 meaningful word. .................................................................................................................. 6

(non-modal word) ..................................................................................................................................................... 6

c) Arabic numbers should be transcribed into the word in English. E.g. 1--> one ..............................................................

d) Special characters should be transcribed into the word in English ................................................................................. 6

e) No punctuation in text except hyphen (-) or apostrophe (‘) that is appropriately used for word spelling....................... 6
g) Abbreviation ...................................................................................................................................................................
6

h) Capitalization ..................................................................................................................................................................
6

i) Informal words (Trending words that are not found in dictionary) .................................................................................
6

j) Words with non-standard pronunciation: .........................................................................................................................


6

k) Half pronounced words:..................................................................................................................................................


6

l) homophone: .....................................................................................................................................................................
7

m) Simplified form/ spoken language ................................................................................................................................. 7

n) Bad language, abusing words, accelerated audio: ...........................................................................................................


7

o) Poem should be transcribed normally .............................................................................................................................


7

p) Double spaces between words are ok .............................................................................................................................


7

q) Spelled word ...................................................................................................................................................................


7

r) Words with non-standard pronunciation ..........................................................................................................................


7

s) If the accelerated audio can be heard clearly, just transcribe it .......................................................................................


7

t) If the voice is from TV, Siri, Google translation etc., just discard it, we only transcribe clear human speech ................
7

3.4 Special Words ................................................................................................................................................................ 7


1. Introduction of Platform Manual
Cut a section of clear human speech from the audio and transcribe the audio into text.

explanation
• gray part: a piece of intercepted audio by default, we can ONLY revise the gray part.

• blue part: your segmented result, also you shall transcribe it into text.

• white part: the audio before and after gray part, no need to transcribe or/and segment; but you could also listen
to this part just for your reference.

• Audio classes:

○ Speech - clear human voice

○ Discard - audio does not meet ASR speech requirements.

• Text box: where text is entered.

• Video: this not use in this project

Keyboard shortcut:
1 - continue to play where you left off.

2 - pause.

3 - play the entire audio.

5 - play default audio (gray area) a

- play cut (current cut-blue area) s

- start cut. e - end cut.


2. Workflow
• Step 1. Listen to the default audio. (gray part)

• Step 2. Select audio category (speech or discard)

• Step 3-1. If you choose ‘discard’ classification, submitting this task directly. No need to change the text
below.

• Step 3-2. If you choose ‘speech’ classification, you need to determine whether to intercept the audio or
not. And then transcribe the audio.

3. Annotation Guidelines
3.1 Discard:
• the entire audio is in not English.

• the entire audio is unclear or non-audible speech.

• the entire audio is song with lyrics in the background or non-human speech, which includes melodies,
animals' sounds, and natural sounds.
• Only one English word should be discarded. (Compound word is considered as one word
like ‘fifty-five’)
• the entire audio is modal words.
Note: If you select “discard”, no need to transcribe, just click “submit” and go to next audio.

3.2 Segment:
a) Do not think about the completeness of the sentence, while cutting the audio.
b) Part of gray area is unclear.
• A segment should always start with clear words, if it’s unclear in the front part or the back part,
you have to intercept it.

• If it’s unclear in the middle of a speech, please cut either side.


For example: “Clear speech1 + unclear+ Clear speech2” -- either “Clear speech1 ” or “Clear
speech2” is accepted for a segment. Do not transcribe both.

But note: If the noise (unclear part) affects the content, intercept it, keep the rest and transcribe. If the
noise does not affect the content, ignore it and transcribe the entire audio.

c) Part of gray area is overlapped (2 or more speakers talking simultaneously)


• Discard: entire audio is overlapping, can’t hear clearly.
• Cut Out: talk about different things simultaneously, but we CAN’T tell the content— please cut this part
out and keep the rest and clear part to transcribe.
• Keep and transcribe:
talk about the same words simultaneously and the words sound clear, you need to
keep this part in and transcribe it. not talk at the same time, the audio should be
regarded as a normal speech case and transcribe it.
There is one main voice in a group conversation, the others are low or fuzzy, and the
sound articulation of the main speaker's speech does not be affected by others. So,
transcribe the main one, and regard others as background sound or noise.

d) Part of gray area is music, melodies, songs, animal, or natural sounds:


• Discard:
If the entire audio is a song or non-human voice, like music, melodies, the sound of animal
and nature and so on – discard this audio.
if the speaker is singing a song which follows melodies – discard.
• Cut Out:
If the background sound is a song with lyrics, cut this part out and reserve the clear human
speech part or discard the entire audio if it's hard to cut the audio.

***BUT, if the background sound does not affect the clarity of the speaker's speech, transcribe
the speaker's speech, and ignore the background sound.
• Keep and Transcribe:

If the background sound is melodies without lyrics and human speech is clear keep it
and transcribe the entire audio. if the speaker is singing a song without background
melodies – transcribe it.
e) Pause/noise at the beginning, middle and end of the audio clip.
• If the noise affects the content, intercept it, keep English, and transcribe. If the noise does not
affect the content, ignore it, and transcribe the entire audio.
※For example: “speech 1 + pause/noise (does not affect the content) + speech 2”. Transcribe
speech1 + speech 2.
※For example: speech1 + pause/noise (affect the content) + speech2 --- either
“speech1” or “speech2” is accepted for a segment. Do not transcribe both.

f) Modal words
• In the beginning or end of the intercepted audio
The selected speech should start with (and end with) up to 2 modal words. ※Example:
There is a paragraph laughing (around 10 "ha") at the beginning of speech, it's enough to keep
a fraction of this part in audio (around 2 "ha ha" in audio ).

Uncountable modal words ---- cut it out, only transcribe the English part.
• In the middle of intercepted audio:
If you can clearly count the number of the modal words, you should transcribe.
※For repeat modal words, write down the same number of modal words in the audio. eg.
3 "ha" in the audio, you need to write "ha ha ha" in the text.
Uncountable modal words ---- do not transcribe and intercept it.
※For example: speech1 + uncountable modal words + speech2
Either “speech1 ” or “speech2” is accepted for a segment. Do not transcribe both.

For modal words that are not in the list, as long as it's not a breathing sound, we can still transcribe it if the number of
it can be counted.

3.3 Text transcribes.


a) Spaces are needed between words. Never wrap text. b) The final intercepted audio must contain at least two
words
(≥2). Must contain at least 1 meaningful word.
(non-modal word).
c) Arabic numbers should be transcribed into the word in English. E.g. 1--> one.
d) Special characters should be transcribed into the word in English.
Eg: @ is not allowed, should be transcribed into the word in English.
e) No punctuation in text except hyphen (-) or apostrophe (‘) that is appropriately used for word spelling.
f) Repeated words and sentences must be transcribed strictly according to the number of times they get repeated.
g) Abbreviation
• Abbreviations for special terms/name etc. (eg. ANTV, SCTV, I-LAND) need to be CAPITALIZED.
• Abbreviations for phrase like gws, otw, bwt,imo, lol, etc (get well soon, on the way, by the way, in my
opinion, laugh out loud, etc), these abbreviations should be written in in lowercase."
h) Capitalization
• Do not capitalize the first letter of text except for proper nouns
• Proper nouns should be capitalized accordingly. Location (city name, street name), person’s name,
brand, zodiac etc.
e.g. New York, Istanbul, Turkey, KFC, NBA
• FB and IG are abbreviations of some proper nouns, just type them as they are.

i) Informal words (Trending words that are not found in dictionary)


• if the voice is clear but not standard (example: baby voice, stammer voice), you should
transcribe it in standard words.
• Non-existing words are not acceptable. Only standard words and commonly used informal
words can be accepted in transcription.

• For commonly used informal words, we advise you to write informal words, transcribe as
you hear. but standard forms are also correct and acceptable. But if you are not sure whether
the informal forms are correct or acceptable, you can write standard words.
j) Words with non-standard pronunciation:
• If pronunciation is not standard but able to tell the correct word, transcribe the correct word.

• If unable to tell, treat it as unclear word and intercept.


k) Half pronounced words:
• There is a half-pronounced word which is not an individual word, we should cut it off.
Eg: “I wanted you to be the Ame”, should cut “Ame ” and transcribe “I wanted you to be the”

• There is a half-pronounced word which is an individual word, we can keep and transcribe it.
Eg: "I want to go to the super" , should transcribe "I want to go to the super" the word
‘supermarket’ is half spoke, but we can hear “super”. “super” is individual word.
The right transcription of this case is "I want to go to the super"
l) homophone:
• Listen to the following default cut to confirm what the whole sentence is, write down the
correct word by context.
※Eg1. The current cut is "The hole (or other word that sounds like /həʊl/ but you
cannot confirm) ", but you can know the sentence is "The whole town disagreed with the
mayor." from the following default cut. So the right transcription of this case is "the
whole".
• If there are multiple homophones whose meaning conforms to the meaning of the default cut
sentence, you can write any word.
※Eg2. The default cut is "where is my deer/dear." Both words match the meaning of the sentence,
you can write anyone.
m) Simplified form/ spoken language
• Transcribing the corresponding form that speaker says. Transcribe what you hear, do not correct
grammar mistakes.

※Eg1. “I'm gonna do some sports”. According to the audio, must be written as "gonna", cannot
be written as "going to".
n) Bad language, abusing words, accelerated audio:
• Both the abusing words and bad language need to be transcribed.
• if the accelerated audio can be heard clearly, just transcribe it.
o) Poem should be transcribed normally.
p) Double spaces between words are ok.
q) Spelled word
For the word spelled between each letter it must be given a space. And please use lower case.
Eg: “a w e s o m e”
r) Words with non-standard pronunciation
If pronunciation is not standard but able to tell the correct word, transcribe the correct word, If unable
to tell, treat it as unclear word an intercept.
s) If the accelerated audio can be heard clearly, just transcribe it.
t) If the voice is from TV, Siri, Google translation etc., if you can hear the video clearly,transceibe it.If it is
unclear,please discard it.

3.4 Special Words


Please refer to “term alignment” spreadsheet

You might also like