Data Correction Tasks
Data Correction Tasks
The labeler will be provided with a few extended samples (10 - 20 seconds) or a YouTube link
Correct transcription errors. Given an audio sample and its corresponding transcription,
Upon completion, provide general feedback on the process using the provided form below, just
fill the form where you shall give a rating on a scale of 0-10:
● Speaker's voice.
● Speaker's style and emotional expression during speech.
● Quality of transcripts before correction (number of mistakes, overall accuracy).
● Variation in keywords usage (repetitive keywords).
Report
Upon completion, please fill up the “Additional feedback on the data.” section in the
provided form to provide a small report that describes the process and the frequent errors you
encountered during the transcripts correctly, this will help us to improve our methods and
engines.
Feedback form
Kindly fill up this form for the labeling process feedback and report
Feedback Form
Voice: sample_link
Transcript: ﺗﻌﺎﻟﻲ ﻛﻠﻲ اﻟﻣﮭم اﻧﺎ ﺟﯾت وراﺣﺔ دﺧﻠت اﻟﻣطﺑﺦ
Correction: اﻟﻣﮭم أﻧﺎ ﺟﯾت وراھﺎ و دﺧﻠت اﻟﻣطﺑﺦ,ﺗﻌﺎﻟﻲ ﻛﻠﻲ
● Here we note there are 2 mistakes, First mistake is “ ”وراﺣﺔcorrected to “”وراھﺎ, a 1 character
mistake, although the correct and wrong characters have the same pronunciation, but we
would prefer the 100% correct characters.
● Second mistake is the missing “”و, it’s not that obvious on the record, since speaker said it
fast, however, without it the sentence doesn’t make sense so you could tell there is a missing
“ ”وeven if you think it’s not obvious in the record.
● Third mistake is “ ”اﻧﺎbecause “ ”اis supposed to be “ ”أ. This is important please keep that in
mind because it has totally different phonemes.
● You also note I have added a punctuation “,” because the speaker paused and changed the
subject to another subject, as well as her tone changed, so it was worth adding a comma.
Voice: sample_link
● ” .ﻗﻠت”& “اﻧﺎ“ “ the ASR actually separated it toﻗﻠﺗﻠﮫ“ Here we note that instead of
● The “:” was added because it was a quoted message,
Voice: sample_link
ﺗدرﯾن اﻧﻲ ﻣﺎ ﻛﺎﻧت ﺗﻘدر ﺗروح ﺗﺷﺗرﯾﮫ ﻣن اﻟﺳوق أﺑدا ,ﺗﺎﺧذ ﻣن ﺑﯾت ﺟدﺗﻲ و ﺗروح اﻟﺧﯾﺎطﺔ ﺗﺧﻠﯾﮫ ﯾﺳوﯾﻠﻧﺎ ﺑﻼﯾز طوﯾﻠﺔ Transcript:
وﺑﻧطﻠون وﻛﺎن ﺷﻛﻠﻧﺎ ﻋﻛس ﻋﻧد اﻟﻧﺎس ,ﯾﺎ ﷲ اﻧﻲ ﻛﻧت أﺗﻔﺷل اﻧﻲ اطﻠﻊ اﻟﻌب ﻣﻊ ﻗراﯾﺑﻧﺎ ,وﻛﺎن اﻟﻛل ﯾﺗﮭزﻓﯾﻧﺎ ﺧﺎﺻﺔ ﺟو ﻗراﯾﺑﻧﺎ ﻣن اﻟﻣدن
اﻟﻛﺑﯾرة ﺣﺗﻰ اﻟﻠﻲ ﻻﺑس و ﷲ اﻟﻌظﯾم اﻧﻲ اﻧﻘﮭر ﯾﻧزﻟون ﺷﺎطر ﯾﻛوﻧون ﻣﻠﯾﺎﻧﺎت اﻟﻣﻼﺑس ﻋﯾوﻧﻲ ﺑس ﻋﻠﻲ اﻗول ﯾﺎ ﺣظﮭم اھل ﺷﯾرو ﻻ
ﺧﻠﯾﻧﻲ اﻗوﻟك ﺳﯾﺎراﺗﮭم ﺟدﯾدة وﻛﻧت اﻗدر ﻋﻧده ﺷﺎﺷﺎت ﺳﯾﺎرة اﻗول ﯾﺎ ﺣظﮭم ﯾﺎ ﻟﯾت زﯾﮭم و ﻓﻲ ﺑﯾت ﺧﺎﻟﻲ ھذا اﻟﻠﻲ اﻗول ﻟﻛم داﯾﻣﺎ ﻣﮭﺗم
ﻓﯾﻧﺎ ﯾﻌطﯾﻧﺎ ﻛذا ﺑس ﻣﺎ ﺷﺎء ﷲ ھو ﻋﻧده ﻋﯾﺎل و ﺑﻧﺎت وش ﻛﺛرھم ﻋﺷﺎن ﻛذا ﻣﺎ ﻛﺎن ﯾﺳﺗﻘﺑﻠﻧﺎ ﻓﻲ ﺑﯾﺗﮫ
ﻣن ﺑﯾت ﺟدﺗﻲ و ﺗروح اﻟﺧﯾﺎطﺔ ﺗﺧﻠﯾﮫ ﯾﺳوﯾﻠﻧﺎ ﺗﺎﺧذ اﻷﻗﻣﺷﺔ ,ﻣن اﻟﺳوق اﺑدا ﻣﺎ ﻛﺎﻧت ﺗﻘدر ﺗروح ﺗﺷﺗرﯾﻠﻧﺎ ﺗدرﯾن إن Correction:
ﺟو ﺑﻼﯾز طوﯾﻠﺔ وﺑﻧطﻠون وﻛﺎن ﺷﻛﻠﻧﺎ ﻋﻛس ﻋﻧد اﻟﻧﺎس .ﯾﺎ ﷲ إﻧﻲ ﻛﻧت ﺗﻔﺷل اﻧﻲ اطﻠﻊ اﻟﻌب ﻣﻊ ﻗراﯾﺑﻧﺎ ,وﻛﺎن اﻟﻛل ﯾﺗﮭزﻓﯾﻧﺎ ﺧﺎﺻﺔ إن
ﯾﻛوﻧون ﻣﻠﯾﺎﻧﺎت اﻟﻣﻼﺑس أﻧﺎ ﻋﯾوﻧﻲ ﺑس ﻋﻠﯾﮫ .أﻗول ﯾﺎ ﻗراﯾﺑﻧﺎ ﻣن اﻟﻣدن اﻟﻛﺑﯾرة ﺣﺗﻰ اﻟﻠﻲ ﻻﺑس و ﷲ اﻟﻌظﯾم اﻧﻲ اﻧﻘﮭر ﯾﻧزﻟون اﻟﺷﻧط
ﺣظﮭم اھل ﯾﺷﺗروﻟﮭم و ﻻ ﺧﻠﯾﻧﻲ اﻗوﻟﻛم ﺳﯾﺎراﺗﮭم ﺟدﯾدة وﻛﻧت اﻗدر ﻋﻧدھم ﺷﺎﺷﺎت ﺳﯾﺎرة اﻗول ﯾﺎ ﺣظﮭم ﯾﺎ ﻟﯾت زﯾﮭم و ﻓﻲ ﺑﯾت ﺧﺎﻟﻲ ھذا
اﻟﻠﻲ اﻗول ﻟﻛم داﯾﻣﺎ ﻣﮭﺗم ﻓﯾﻧﺎ ﯾﻌطﯾﻧﺎ ﻛذا ﺑس ﻣﺎ ﺷﺎء ﷲ ھو ﻋﻧده ﻋﯾﺎل و ﺑﻧﺎت وش ﻛﺛرھم ﻋﺷﺎن ﻛذا ﻣﺎ ﻛﺎن ﯾﺳﺗﻘﺑﻠﻧﺎ ﻓﻲ ﺑﯾﺗﮫ
● This sample has so many mistakes, I corrected some of them just for illustration.
● Also this sample has some pronunciation mistakes, in that case of such mistakes (has to
be with numerous pronunciation errors) it can be neglected/removed.
!Thank you