You are on page 1of 95

F/BERC 1 (2021)

FERC/BERC 1

Faculty/Branch Ethics Review Committee


Universiti Teknologi MARA

Ethics Approval Application Form for Undergraduate or


Postgraduate by Coursework
Borang Permohonan Kelulusan Etika bagi Pelajar Sarjana Muda atau
Pasca Siswazah Kerja Kursus

This application is for the purpose of obtaining approval to conduct research involving
the human subject.
Please attach a copy of the Research Proposal.
Permohonan ini dikemukakan untuk tujuan kelulusan menjalankan penyelidikan melibatkan
manusia sebagai subjek kajian.
Sila lampirkan salinan kertas Cadangan Penyelidikan.

Part A: Details of Researcher


Bahagian A: Maklumat Penyelidik

Title of Research Project: Development of Music Identification by Using Human


Tajuk Penyelidikan: Voice Recognition
Muhammad Shafiq Haqime Bin Mohd Isa
Name of Student:
Nama Pelajar:

Name of Supervisor: Albin Lemuel Kushan


Nama Penyelia:

Faculty/Academy/Branch: College of Computing Informatics and Media, Universiti


Fakulti/Akademi/Cawangan: Teknologi Mara

Contact No/ Email: 011-69894818 / 2021125721@student.uitm.edu.my


No.Telefon/ Emel:

☐ Undergraduate / Sarjana Muda

☐ Postgraduate by Coursework/ Pasca Siswazah Kerja Kursus

Does the research require an external Research Ethics Committee approval? (e.g.
MREC)
Adakah penyelidikan ini memerlukan kelulusan Jawatankuasa Etika Penyelidikan Luaran?
(contoh MREC)

☐ Yes / Ya External Committee Name:

☐ No / Tidak

Research funding: Yes/ No


Dana Penyelidikan: Ada/ Tiada

Page 1 of 15
F/BERC 1 (2021)

Part B: Research Details


Bahagian B: Maklumat Penyelidikan

Part B1
Bahagian B1
☐ Interviews ☐ Case study
Temubual Kajian kes

☐ Focus groups ☐ Intervention study


Kumpulan terfokus Kajian intervensi

☐ Questionnaires ☐ Personal records


Soal selidik Rekod peribadi

☐ Action research ☐ Secondary data analysis


Kajian tindakan Analisis data sekunder

☐ Observation ☐ Others (provide details):


Pemerhatian Lain-lain (nyatakan):

Part B2
Bahagian B2
1. Background:
Latar belakang:

A brief explanation of the problem to be studied and literature review (citations) to support.
Keterangan ringkas tentang masalah yang dikaji dan soroton kajian (sitasi) untuk
menyokong keterangan tentang masalah yang dikaji.

According to Brown (2015), there are 92 percent of population that suffer from
earworms disease. Earworms can be described as a situation where a melody stuck
in the head. If the melody stuck for a long time, it can cause people to have intrusive
thoughts that associate with anxiety and depression.

Voice recognition is the ability of a machine or program to receive and interpret


dictation or to understand and carry out spoken commands (Scardina, 2018). Voice
recognition has widely use nowadays and been utilized its functionality to the high
technology such as Apple’s Siri, Amazon’s Alexa, and Microsoft’s Cortana. Beside
using only your fingers to surf internet, we can just instruct command our device to
search anything that we desire, enabling hands-free requests, reminders and other
simple tasks.

Voice recognition had helped most of the population ease their daily routine and
keeping the work on tracks. Therefore, it is reasonable to be taken as an opportunity
to develop this system and help people to find songs by only using voices.

Page 2 of 15
F/BERC 1 (2021)

Problem statement (Less than 100 words):


Penyataan masalah (Kurang dari 100 patah perkataan):

1) Having a rough time to remember the lyrics or the song title of an artist

Sometimes, for certain user it can be hassle to recall back what lyrics that was
played into radio or random tracks from our phone. This can be a case where
people suffer from short term memory lost (Zimmermann, 2017). Although it is
quite common for people to forget lyrics of a song, that perhaps played so
randomly, or a new released song that’s popped up from car’s radio.

2) Song or melody that stuck into their head

There may be sometimes a melody can get stuck into our head for a long period
of time. This phenomenon can be called as involuntary musical imagery (INMI),
more widely known as “earworms”. According to a study from Durham University
lead by Jakubowski K., earworms are an extremely common phenomenon and
an example of spontaneous cognition. It stated that 40 percent of our days is
spending on thinking about random things and starting to try to understand about
the brains mechanism which is the thoughts that was unrelated with current
tasks.

3) Always get the non-similar result when searching for a song

Searching for a song can be impossible for some people. With the only melody
that stuck in their head as a clue, user tend to search by certain words of cryptic
lyric, the genre or perhaps nationality. With using this system, user can get a
result that is precise by 90 percent without having to search and entering
clueless query into the internet

References:
Rujukan:

Only include references cited in this document. Do not paste all your references from your
main proposal
Hanya sertakan rujukan yang dipetik dalam dokumen ini. Jangan tampalkan semua rujukan
dari kertas cadangan utama anda

R Jones, N. (2015, January 14). Nielson Study Reveals Rock Prevails As Most
Popular Genre In The US. Hypebot. Retrieved May 1, 2022, from
https://www.hypebot.com/hypebot/2015/01/nielson-study-reveals-rock-prevails-as-
most-popular-genre-in-the-us.html

Music. (2022, May 8). In Wikipedia. https://en.wikipedia.org/wiki/Music


Scardina, J. (2018, January 31). Voice Recognition (Speaker Recognition).
SearchCustomerExperience. Retrieved May 1, 2022, from
https://www.techtarget.com/searchcustomerexperience/definition/voice-
recognition-

Page 3 of 15
F/BERC 1 (2021)

speakerrecognition#:%7E:text=Voice%20or%20speaker%20recognition%20is,App
le’s%20Siri%20and%20Microsoft’s%20Cortana.

Brown, H. (2015, November 1). How Do You Solve a Problem Like an Earworm?
Scientific American. Retrieved May 1, 2022, from
https://www.scientificamerican.com/article/how-do-you-solve-a-problem-like-an-
earworm/?error=cookies_not_supported&code=9fbb58cf-7a39-4ddd-b1ce-
a6bad73d2953

A. (2018, May 3). What is Query by Humming? - ACRCloud. Medium. Retrieved


May 1, 2022, from https://medium.com/acrcloud/what-is-query-by-humming-
9fb8e74e738a
Zimmermann, K. A. (2017, April 24). What Is Short-Term Memory Loss?
Livescience.Com. Retrieved May 1, 2022, from
https://www.livescience.com/42891-short-term-memory-loss.html

Welch, A. (2016, November 3). Psychologists identify why certain songs get stuck
in your head. CBS News. Retrieved May 1, 2022, from
https://www.cbsnews.com/news/psychologists-identify-why-certain-songs-get-
stuck-in-your-head/

2. Research objectives (enumerate):


Objektif penyelidikan (senaraikan):

Number your objectives


Nomborkan objektif anda

1. To develop song finder application that use voice recognition which is humming
or singing.
2. To evaluate the accuracy and the effectiveness of the system into helping
people find a song.

3. Expected benefits (Less than 100 words):


Faedah yang dijangka (Kurang dari 100 patah perkataan):

The system may solve the earworms problem, following with some people that have
melody stuck in their head. This system also help user to find song easier in the
future, with only humming into the system.

Page 4 of 15
F/BERC 1 (2021)

4. Date of research commencement-end:


Tarikh penyelidikan bermula-berakhir:

March 2022 – March 2023

5. Expected date of initial data collection:


Jangkaan tarikh pengumpulan data bermula:

February 2023

At least TWO (2) months before submission of this form


Sekurang-kurangnya DUA (2) bulan sebelum penghantaran borang ini

6. Location of research:
Lokasi penyelidikan dijalankan:

College of Computing Informatics and Media, Universiti Teknologi Mara

7. Research design dan methodology:


Rekabentuk penyelidikan dan metodologi:

Extreme Programming (XP)

i) Planning

This phase required for information gathering that was taken from online
survey, case study from journal, articles and other sources from literature
review. All the hardware and software requirements discuss here.

ii) Designing

From the requirement analysis, the details are used to design the system.
Flowchart and module are design to organize the system flow. Mock
interface also designed to picture the system interface.

iii) Coding

Coding phase is also known as development phases. Here the programming


will be written according to the flowchart and use case diagram from previous
phase.

iv) Testing

Here the Unit Testing is run, to review whether any possible error is found,
thus will be fixed back in Coding Phase.

v) Listening

This phase required for Acceptance Test, after the system is complete. User
will give their feedback from using the system.

Page 5 of 15
F/BERC 1 (2021)

Quantitative analysis
Data Collection Methods: Questionnaires
Respondent: Students of UiTM Melaka
Data analysis: Google Form
8. Inclusion and exclusion criteria:
Kriteria kemasukan dan pengecualian:

Inclusion criteria:
Kriteria kemasukan:
● Student 18 years old and above
● Have access to a mobile phone

Exclusion criteria:
Kriteria pengecualian:
● Students under 18 years old
● Lack access to a mobile phone

9. Sample size:
Saiz sampel:
State the sampling method and minimum sample size calculated.
Nyatakan kaedah persampelan dan pengiraan saiz sampel minima.

Student of UiTM Melaka Kampus Jasin

Page 6 of 15
F/BERC 1 (2021)

Calculation:
Pengiraan:

N = 40
S = 36

Based on table sample size by Krejcie and Morgan, 1970.


The sample size for this research is 36

Page 7 of 15
F/BERC 1 (2021)

10. Research flowchart:


Carta alir penyelidikan:

Extreme Programming (XP) Model

Phase Activities
Planning - Gathered all the necessary information
- Planned the application features including the
method to be used in application development.
Designing - Designed flowchart, entity relationship diagram
and mock interface
Coding - Develop mobile application user interface
- Develop application module and function
- Plug in the music information retrieval (MRI) API
into the application
- Fixing errors that found from Unit Testing
- Integrated all the function into a single application
Testing - Conduct Unit Testing to identify any errors
- Verify the accuracy of the humming from the
audio-to-digital conversion function with query by
humming dataset to find song
- Determine whether the MRI API is function
Listening - Record Acceptance Test from the user
11. Statistical analysis:
Analisa statistik:
Briefly describe the data analysis and statistical analysis sotfware that will be utilized.
Huraikan secara ringkas analisis data dan perisian analisis statistik yang akan digunakan

Software to be used: Google Form

Descriptive statistic:

i) To describe the basic features of the data in a study


ii) To provide simple summaries about the sample and the measure
iii) To provide simple graphic analysis
.

Page 8 of 15
F/BERC 1 (2021)

Part C: Funding details


Bahagian C: Maklumat Dana

* If not applicable please write ‘-NA-’ in the spaces provided.


Jika tiada kaitan sila tulis ‘-NA-’ di ruangan disediakan.

1. Grant / Source: -NA-


Geran / Sumber:

2. Grant Registration No. (IRES): -NA-


No. Pendaftaran Geran:

3. Total allocation: -NA-


Jumlah peruntukan:

4. Duration of grant: -NA-


Jangkamasa peruntukan:

5. Others: -NA-
Lain-lain:

Part D: Agreement to conduct the research project.


Bahagian D: Pengesahan persetujuan menjalankan penyelidikan.

To be completed and signed by the following members of the research group.


Perlu dilengkapkan dan ditandatangani oleh ahli-ahli berikut dalam kumpulan penyelidikan.

1. Student
Pelajar

Student ID:
No. Pelajar: 2021125721
Mobile phone: 011-69894818
Telefon bimbit:
Email:
2021125721@student.uitm.edu.my
Emel:

2. Supervisor
Penyelia

Staff ID/No.Staf/
311841
Mobile phone: Telefon 013-8218885
bimbit:
Email: albin1841@uitm.edu.my
Emel:

Signature:
Tandatangan:

Page 9 of 15
F/BERC 1 (2021)

3. Co-Researcher
Penyelidik Bersama

Name: -NA-
Nama:
Staff ID/Student ID: -NA-
No.Staf/No. Pelajar:
Affiliation:
Jabatan: -NA-
Mobile phone: Telefon -NA-
bimbit:
Email: -NA-
Emel:
Signature: -NA- Date: -NA-
Tandatangan: Tarikh:

Part E: Research Risk Classification


Bahagian E: Klasifikasi Risiko Kajian

PLEASE ANSWER ALL QUESTIONS BELOW.


If your answer is ‘Yes’ to any of the following questions, please include a brief
information in the space provided.

SILA JAWAB KESEMUA SOALAN DI BAWAH.


Sekiranya jawapan anda ‘Ya’ kepada mana-mana soalan di bawah, sertakan maklumat ringkas
di ruang yang disediakan.

PARTICIPANT PROFILE No Yes Brief description (If YES)


1. Are the participants under 18 years
old?
Adakah peserta berumur di bawah 18 /
tahun?
2. Are the participants from a particular
vulnerable group? (e.g. with mental
disorder, mentally challenged,
disabled, minority, disadvantaged
group etc.)
Adakah peserta daripada golongan
/
rentan? (cth: kecelaruan mental, kelainan
keupayaan intelektual, berkeperluan khas,
minoriti dan sebagainya.)
3. Are any of these participants/patients
in terminal care?
Adakah peserta/pesakit ini memerlukan /
rawatan terminal?
4. Are any of these participants unable
or are incapable of giving consent?
(i.e. consent will be obtained indirectly
from a legal guardian etc.)
Adakah peserta tidak boleh atau tidak
/
berupaya memberi izin?
(spt: izin akan diambil secara tidak
langsung daripada penjaga sah dan
sebagainya.)

Page 10 of 15
F/BERC 1 (2021)

5. Are the participants given any form of


emolument to participate? /
Adakah peserta diberi upah untuk
menyertai kajian?
PRIVACY AND CONFIDENTIALITY No Yes Brief description (If YES)
6. Does any of the data collected have
the potential to cause discomfort,
embarrassment, or psychological
harm to the participants?
(e.g. sexual orientation etc.)
Adakah data yang dikumpul berpotensi
/
untuk menyebabkan rasa tidak selesa,
keaiban atau gangguan psikologi kepada
peserta? (cth: orientasi seksual dan
sebagainya.)
7. Does your research involve measures
undeclared to the participants?
(e.g. covert observations etc.)
Adakah penyelidikan anda melibatkan
/
langkah-langkah yang tidak dimaklumkan
kepada peserta?
(cth: pemerhatian rahsia dan sebagainya.)
8. Will the collected data be made
available to other parties not involved
in the research? (e.g. government
agencies)
/
Adakah data yang dikumpulkan akan
didedahkan kepada pihak lain yang tidak
terlibat dalam penyelidikan? (cth. agensi
kerajaan)
RISK OF HARM No Yes Brief description (If YES)
9. Will you be collecting biological
samples e.g. body fluids?
Adakah anda akan mengumpul sampel /
biologi contohnya. cecair badan?
10. Do you have access to any
information that will allow the
identification of individual human
participants?
/
Adakah anda mempunyai akses kepada
apa-apa maklumat yang akan
membolehkan pengenalpastian peserta
secara individu?
11. Is the collection method invasive and
has the potential to cause harm, pain
or discomfort?
(except finger, heel, ear prick.)
/
Adakah kaedah pengumpulan invasif dan
berpotensi menyebabkan kemudaratan,
kesakitan atau ketidakselesaan?
(kecuali tusukan jari, tumit, telinga.)
12. Will the participants be subjected to
vigorous physical tests or exercise
regime?
(if ‘No’, go to Question 15.)
/
Page 11 of 15
F/BERC 1 (2021)

Adakah peserta akan melalui ujian fizikal


atau senaman berintensiti tinggi?
(jika 'Tidak', teruskan ke Soalan 15.)
13. Are the participants non-athletes or
patients with chronic illnesses?
Adakah peserta bukan atlet atau pesakit /
dengan penyakit kronik?
14. Will they be subjected to maximal
exercise intensity?
Adakah mereka akan melalui senaman /
berintensiti maksimum?
15. Is there any form of procedure/
medication involved?
Adakah terdapat sebarang prosedur/ /
penggunaan ubat yang terlibat?
16. Is there any drug or device used with
an unapproved indication?
Adakah terdapat ubat atau peranti yang /
digunakan tanpa indikasi yang diluluskan?
17. Can the informed consent be obtained
from anyone other than the
patient/participant?
Adakah keizinan kajian boleh diperoleh
/
daripada sesiapa selain pesakit/peserta?
18. Is there any kind of risk to the
participant if he/she chooses to
withdraw?
Adakah terdapat sebarang kemudaratan
/
kepada peserta jika dia memilih untuk
menarik diri?
19. What type of biological samples will
be collected?
(Please indicate amount and
frequency.)
/
Apakah jenis sampel biologi yang
dikumpul?
(Sila nyatakan jumlah dan kekerapan.)

20. Will the biological samples obtained


be stored for future research?
Adakah sampel biologi yang dikumpul
akan disimpan untuk penyelidikan di /
masa hadapan?
21. Do you propose to analyse the
sample outside of the original purpose
for which it has been collected?
Adakah anda bercadang untuk
/
menganalisis sampel bagi tujuan selain
daripada tujuan asal ia diperoleh
22. If ‘Yes’ to No. 21, will you obtain
consent from participants for this
purpose?
Jika 'Ya' pada No. 20, adakah anda akan
/
mendapatkan persetujuan daripada
peserta untuk tujuan ini?

Page 12 of 15
F/BERC 1 (2021)

OTHER ETHICAL ISSUES No Yes Brief description (If YES)


23. Are there any other ethical issues not
stated in this checklist?
Adakah terdapat sebarang isu etika lain
yang tidak dinyatakan dalam senarai /
semak ini?

Page 13 of 15
F/BERC 1 (2021)

Part F: Applicant Checklist


Bahagian F: Senarai Semak Pemohon

Terms of Submission of Ethics Approval Application

1. Please ensure that the named research team members have signed the
application.
2. Please ensure that the application has been signed and endorsed by the
Department or Postgraduate Research Committee.
3. All required documents must be submitted at least two (2) months before the
data collection.
4. Any data collection instruments that require completion by
respondents/participants shall be prepared in the Malay and English languages, and
other language(s) understood by the participants.
5. Please ensure that you have obtained the necessary permission or paid the
stipulated fee for use of survey questionnaires and/or statistical analysis software, if
and when necessary.

ITEM YES NO
PERKARA YA TIDAK
1 Have you presented your proposal at the Department or
Postgraduate Research Committee?
Adakah anda telah membentangkan proposal anda di
Jawatankuasa Penyelidikan Jabatan atau Pascasiswazah? /
2 Have you completed the F/BERC 1 form?
Adakah anda telah melengkapkan Borang F/BERC 1? /
3 Have you completed the F/BERC 2 and/or F/BERC 3 form?
Adakah anda telah melengkapkan Borang F/BERC 2 atau dan /
borang F/BERC 3?
4 Has your supervisor checked your application form?
Adakah penyelia anda telah menyemak borang permohonan /
anda?
5 Has the form been signed by all researchers?
Adakah borang ditandatangani oleh semua penyelidik? /

Additional comments (if any):


Komen Tambahan (Jika Ada):

11/11/2022

Applicant’s Signature Date

11/11/2022

Supervisor’s Signature Date

Page 14 of 15
F/BERC 1 (2021)

Part G: Verification from Department or Postgraduate Research Committee


Bahagian G: Pengesahan Jawatankuasa Penyelidikan Jabatan atau Pascasiswazah

We have deliberated on the application and propose as below:


Kami telah meneliti permohonan ini dan mencadangkan seperti di bawah:

Minimal risk research. Recommended for approval without presentation.


Penyelidikan melibatkan risiko minima. Dicadangkan untuk mendapat kelulusan tanpa
pembentangan.

More information or presentation at BERC/FERC required.


Perlu maklumat tambahan atau pembentangan ke BERC/FERC.

More than minimal risk research. Recommended to be forwarded to UiTM REC.


Penyelidikan melibatkan risiko melebihi minima. Dicadangkan untuk dihantar kepada
UiTM REC.

Comment if any:
Ulasan jika ada:

Signature Tandatangan: Official stamp: Date:


Undergraduate / Postgraduate / Cop rasmi: Tarikh:
Research Coordinator Koordinator
Peringkat Pra-siswazah/Pasca-
siswazah/Penyelidikan

Page 15 of 15
F/BERC 2 (2021)

FERC/BERC 2

Faculty/Branch Ethics Review Committee


Universiti Teknologi MARA

Participant Information Sheet

Research Title

Development of Music Identification by Using Human Voice Recognition

Introduction of Research

This research focuses on the development of a mobile application that can be used to search
a song using human voice, which are humming, to help user find melody that may stuck in
their head.

Purpose of Research

1. To develop song finder application that use voice recognition which is humming or singing.
2. To evaluate the accuracy and the effectiveness of the system into helping people find a
song.

Research Procedure

Upon the completion of this application, individuals will be requested to provide feedback
through the use of an evaluation.

Participation in Research

Your participation in this research is entirely voluntary. You may refuse to take part in the
study or you may withdraw yourself from participation in the research at any time without
penalty.

Benefit of Research

Information obtained from this research will benefit the individuals, researchers, institution and
community for the advancement of knowledge and future practice.

Research Risk

The research poses no risk to the participants and the participants are free to withdraw from
the experiment.

Confidentiality

Your information will be kept confidential by the investigators and will not be made public
unless disclosure is required by law. By signing this consent form**, you will authorize the
review of records, analysis and use of the data arising from this research.

If you have any question about this research or your rights, please contact Muhammad Shafiq
Haqime bin Mohd Isa at 011-69894818
F/BERC 2 (2021)

**If you are using an online survey form (obtaining signature of participants are not feasible), please
include these statements at the beginning of the survey document:

By participating in this survey, I agree that:


1. I am 18 years old and above
2. I authorize the review of records, analysis and use of the data arising from this research.
3. I understand the nature and scope of the research being undertaken.
4. I have read and understood all the terms and conditions of my participation in the research.
5. I voluntarily agree to participate in this research and follow the study procedures.
6. I may at any time choose to withdraw from this research without giving any reason.
F/BERC 2 (2021)

Consent Form1
To become a participant in the research, you or your legal guardian are required to sign this
Consent Form.
I herewith confirm that I have met the requirement of age and am capable of acting on behalf
of myself / as2 a legal guardian as follows:

1. I understand the nature and scope of the research being undertaken.


2. I have read and understood all the terms and conditions of my participation in the
research.
3. All my questions relating to this research and my participation therein have been
answered to my satisfaction.
4. I voluntarily agree to take part in this research, to follow the study procedures and to
provide all necessary information to the investigators as requested.
5. I may at any time choose to withdraw from this research without giving any reason.
6. I have received a copy of the Participant Information Sheet and Consent Form.
7. Except for damages resulting from negligent or malicious conduct of the
researcher(s), I hereby release and discharge UiTM and all participating researchers
from all liability associated with, arising out of, or related to my participation. I agree
to hold them harmless from any harm or loss that may be incurred by me due to my
participation in the research.

______________________________________________________________________
Name of Participant/Legally authorized representative (LAR) Signature
______________________________________________________________________
I.C No Date
______________________________________________________________________
Name of Witness3 Signature
______________________________________________________________________
I.C No Date
______________________________________________________________________
Name of Consent Taker Signature
______________________________________________________________________
I.C No Date

1
Original signed copy is to be retained by the Principal Investigator.
2
Delete whichever is not applicable.
3
A witness is only required for oral consent.
REC 4/ 2020/BM Pind. 2 (2020)

FERC/BERC 2

Jawatankuasa Penilaian Etika Fakulti/Cawangan


Universiti Teknologi MARA

Borang Maklumat Peserta

Tajuk penyelidikan

Pembangunan Pengenalpastian Muzik dengan Menggunakan Pengecaman Suara Manusia

Pengenalan penyelidikan

Penyelidikan ini memberi tumpuan kepada pembangunan aplikasi mudah alih yang boleh
digunakan untuk mencari lagu menggunakan suara manusia, yang bersenandung, untuk
membantu pengguna mencari melodi yang mungkin melekat di kepala mereka.

Tujuan penyelidikan

1. Membangunkan aplikasi pencari lagu yang menggunakan pengecaman suara iaitu


bersenandung atau menyanyi.
2. Untuk menilai ketepatan dan keberkesanan sistem untuk membantu orang ramai mencari
lagu.

Prosedur penyelidikan

Selepas permohonan ini selesai, individu akan diminta untuk memberikan maklum balas
melalui penggunaan penilaian.

Penyertaan dalam penyelidikan

Penyertaan anda di dalam penyelidikan ini adalah secara sukarela. Anda berhak menolak
tawaran penyertaan ini atau menarik diri daripada penyelidikan ini pada bila-bila masa tanpa
sebarang penalti.

Manfaat penyelidikan

Maklumat yang didapati dari penyelidikan ini akan memanfaatkan individu, penyelidik, institusi
dan komuniti dalam kemajuan pengetahuan dan amalan pada masa hadapan.

Risiko penyelidikan

Penyelidikan tidak menimbulkan risiko kepada peserta dan peserta bebas untuk menarik diri
daripada eksperimen.

Kerahsiaan

Maklumat anda akan dirahsiakan oleh penyelidik dan tidak akan didedahkan melainkan jika
ia dikehendaki oleh undang-undang. Dengan menandatangani borang persetujuan** ini, anda
membenarkan penelitian rekod, penganalisaan dan penggunaan data hasil daripada
penyelidikan ini.
REC 4/ 2020/BM Pind. 2 (2020)

Sekiranya anda mempunyai sebarang pertanyaan mengenai penyelidikan ini atau hak-hak
anda, sila hubungi Muhammad Shafiq Haqime bin Mohd Isa di talian 011-69894818

**Sekiranya anda menggunakan borang tinjauan atas talian (tandatangan peserta tidak dapat
diperoleh), sila sertakan pernyataan ini sebelum soalan tinjauan:

Dengan menyertai tinjauan ini, saya bersetuju bahawa:


1. Saya berumur 18 tahun ke atas
2. Saya membenarkan penyemakan rekod, analisis dan penggunaan data dari penyelidikan ini.
3. Saya memahami tujuan dan skop penyelidikan yang sedang dijalankan.
4. Saya telah membaca dan memahami semua terma dan syarat penyertaan saya dalam
penyelidikan ini.
5. Saya secara sukarela bersetuju untuk mengambil bahagian dalam penyelidikan ini and mengikuti
prosedur kajian.
6. Saya boleh memilih untuk menarik diri dari penyelidikan ini pada bila-bila masa tanpa
memberikan alasan.
___________________________________________________ ______________
Borang Izin1

Untuk menyertai penyelidikan ini, anda atau penjaga sah perlu menandatangani Borang Izin
ini.

Saya dengan ini mengesahkan bahawa saya telah memenuhi syarat umur dan berupaya
bertindak bagi pihak saya sendiri/ sebagai2 penjaga yang sah dalam perkara-perkara berikut:

1. Saya memahami ciri-ciri dan skop penyelidikan ini.


2. Saya telah membaca dan memahami semua syarat penyertaan penyelidikan ini.
3. Saya berpuas hati dengan jawapan pada kemusykilan saya tentang penyelidikan ini.
4. Saya secara sukarela bersetuju menyertai penyelidikan ini dan mengikuti segala atur
cara dan memberi maklumat yang diperlukan kepada penyelidik seperti yang
dikehendaki.
5. Saya boleh menarik diri daripada penyelidikan ini pada bila-bila masa tanpa memberi
sebab.
6. Saya telah pun menerima satu salinan Borang Maklumat Peserta dan Borang Izin.
7. Selain daripada kecederaan yang disebabkan oleh kelalaian dan kecuaian penyelidik,
saya dengan ini melepaskan dan menggugurkan UiTM dan semua penyelidik dari
semua liabiliti berhubung dengan, wujud dari atau berkaitan dengan penyertaan saya.
Saya bersetuju untuk menjadikan mereka tidak bertanggunggjawab terhadap apa-apa
kemudaratan atau kerugian yang mungkin akan saya tanggung disebabkan oleh
penyertaan saya.

______________________________________________________________________
Nama Peserta/ Wakil Sah yang berkuatkuasa Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
______________________________________________________________________
Nama Saksi3 Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
______________________________________________________________________
Nama Penyelidik/Pengambil Izin Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
REC 4/ 2020/BM Pind. 2 (2020)

1
Salinan asal disimpan oleh Penyelidik Utama dan satu salinan diserahkan kepada peserta.
2
Potong mana yang tidak berkenaan.
3
Saksi dimestikan memberi izin secara lisan.
ENGLISH VERSION
DEVELOPMENT OF MUSIC IDENTIFICATION BY USING HUMAN
VOICE RECOGNITION

Name of researcher: Muhammad Shafiq Haqime bin Mohd Isa


Student ID: 2021125721

User Acceptance Testing


SECTION A: DEMOGRAPHIC INFORMATION
AGE (STATE
HERE)
GENDER (TICK MALE
ONE) FEMALE

SECTION B: USER EVALUATION QUESTIONNAIRE


This section is about user experience while using the application. User will choose
from 1 to 5 from the level of agreement scale which is 1 is strongly disagree and 5
is strongly agree.
Item Strongly Disagree Neutral Agree Strongly
Disagree Agree
1 2 3 4 5
I can search a song in
short amount of time
It is easy to hum for
searching a song
The result that I found
is accurate with my
humming voice
The result is written
and displayed well
I found this
application is
interesting
MALAY VERSION
PEMBANGUNAN PENGENALAN MUZIK DENGAN MENGGUNAKAN
PENGIKTIRAFAN SUARA MANUSIA

Nama pengkaji: Muhammad Shafiq Haqime Bin Mohd Isa


No pelajar: 2021125721

Ujian Penerimaan Pengguna


BAHAGIAN A: MAKLUMAT DEMOGRAFI
UMUR (NYATAKAN
DISINI)
JANTINA (PILIH LELAKI
SATU) PEREMPUAN

BAHAGIAN B: SOAL SELIDIK PENILAIAN PENGGUNA


Bahagian ini adalah mengenai pengalaman pengguna semasa menggunakan aplikasi.
Pengguna akan memilih antara 1 hingga 5 daripada skala tahap persetujuan iaitu 1
sangat tidak setuju dan 5 sangat setuju.
Bahan Sangat Tidak Neutral Setuju Sangat
Tidak Setuju Setuju
Setuju
1 2 3 4 5
Saya boleh mencari lagu
dalam masa yang singkat
Adalah senang bersenandung
untuk mencari lagu
Keputusan yang saya dapati
adalah tepat dengan suara
saya yang bersenandung
Hasilnya ditulis dan
dipaparkan dengan baik

Saya dapati aplikasi ini


menarik
UNIVERSITI TEKNOLOGI MARA

DEVELOPMENT OF MUSIC IDENTIFICATION


BY USING HUMAN VOICE RECOGNITION

MUHAMMAD SHAFIQ HAQIME BIN MOHD ISA

BACHELOR OF COMPUTER SCIENCE (Hons.)


NETCENTRIC COMPUTING

MARCH 2022
Universiti Teknologi MARA

Development of Music Identification By Using


Human Voice Recognition

Muhammad Shafiq Haqime Bin Mohd Isa

Thesis submitted in fulfilment of the requirements for Bachelor of


Computer Science (Hons.) Netcentric Computing Faculty of
computer and Mathematical sciences

March 2022
SUPERVISOR APPROVAL

DEVELOPMENT OF MUSIC IDENTIFICATION BY USING HUMAN VOICE


RECOGNITION (STARIO)

By

MUHAMMAD SHAFIQ HAQIME BIN MOHD ISA

2021125721

The thesis was prepared under the supervision of the project supervisor, Mr. Ts. Albin
Lemuel Kushan. It was submitted to the Faculty of Computer and Mathematical
Sciences and was accepted in partial fulfilment of the requirements for the degree of
Bachelor of Computer Science (Hons.) Netcentric Computing.

Approved by

…………………………….

Ts. Albin Lemuel Kushan


Project Supervisor

15 JULY 2022

i
STUDENT DECLARATION

I certify that this thesis and the project to which it refers is the product of my own work
and that any idea or quotation from the work of other people, published or otherwise
are fully acknowledged in accordance with the standard referring practices of the
discipline.

…………………….

MUHAMMAD SHAFIQ HAQIME

BIN MOHD ISA

15 JULY, 2022

ii
ACKNOWLEDGEMENT

Alhamdulillah, praises and thanks to Allah because of His Almighty and His utmost
blessings, I was able to finish this final year project report within the time duration
given. Firstly, my special thanks goes to my supervisor for giving me the opportunity
to embark on this project. Thank you for the constant supports and the time spent for
constructive comments and long discussion that lead to the complete of this work.

Special appreciation also goes to my beloved parents. I would like to take this
opportunity to express my gratitude and indebtedness to them for their unconditional
love and support throughout this whole process.

Last but not least, I would like to give my gratitude to my dearest friend for helping me
in searching for references and give a moral support during the development of this
project

iii
TABLE OF CONTENTS

CONTENTS

SUPERVISOR APPROVAL i
STUDENT DECLARATION ii
ACKNOWLEDGEMENT iii
LIST OF FIGURES vii
LIST OF TABLES viii
LIST OF ABBREVIATION ix
CHAPTER 1 ix
INTRODUCTION 1
1.1 Project Background 1
1.2 Problem Statement 3
1.2.1 Having a rough time to remember the lyrics or the song title
of an artist 3
1.2.2 Song or melody that stuck into their head 3
1.2.3 Always get the non-similar result when searching for a song 4
1.3 Project Aim 4
1.4 Project Objective 4
1.5 Project Scope 5
1.6 Project Significance 5
1.6.1 The system may solve the earworms problem 5
1.6.2 Easier to find song in the future 5
1.6.3 Give benefit to the music industry 5
1.7 Summary 6
CHAPTER 2 7
LITERATURE REVIEW 7
2.1 Music Services 7
2.1.1 Origin of Music Services 7
2.1.2 Rising of Music Services in Internet 8
2.1.3 Music Website VS Music Application 9

iv
2.2 Mobile Application 10
2.2.1 Web Mobile Application Development 10
2.2.2 Native Mobile Application Development 12
2.2.3 Hybrid Mobile Application Development 12
2.2.4 Comparison of Web, Native, and Hybrid Mobile Application
13
2.3 Human Voice Recognition 13
2.3.1 Definition of Human Voice Recognition 14
2.3.2 Analog-to-Digital Conversion 16
2.3.3 Query by Humming 22
2.4 Related Works 24
2.4.1 Shazam 24
2.4.2 SoundHound 25
2.4.3 Musixmatch 25
2.4.4 Discussion of related works 26
CHAPTER 3 27
PROJECT METHODOLOGY 27
3.1 Project Methodology 27
3.2 Project Framework 29
3.3 Planning 31
3.3.1 Hardware requirements 31
3.3.2 Software requirements 32
3.3.3 Survey analysis 32
3.3.4 Use Case Diagram 35
3.4 Designing 36
3.4.1 Flowchart 36
3.4.2 Entity Relationship Diagram 38
3.4.3 Interface Design 39
3.4.4 System Architecture 42
3.5 Coding 43
3.5.1 Recording – capturing the sound 44

v
3.5.2 Time-Domain and Frequency Domain 45
3.5.3 The Discrete Fourier Transform 46
3.5.4 Music Recognition: Fingerprinting a Song 48
3.5.5 The Music Algorithm: Song Identification 51
3.6 Testing 52
3.6.1 Unit Testing 52
3.6.2 Acceptance Testing 52
3.7 Listening 53
3.8 Summary 53
References 54

vi
LIST OF FIGURES

Figure 2.1: Three tier architecture 11


Figure 2.2: General block diagram of speech recognition 16
Figure 2.3: A continuous signal (analog) turning into a digital signal 17
Figure 2.4: An example on how aliasing happens 18
Figure 2.5: Example on how resolution affects the digital signal 20
Figure 2.6: Bit length and their number of levels and step size for a 5V
reference range 21
Figure 2.7: Proposed Query by Humming (QbH) System 23
Figure 2.8: The preprocessing that creates fingerprints 24
Figure 3.1: SDLC Illustration 27
Figure 3.2: Extreme Programming Phase 28
Figure 3.3: XP Model Phases for Stario 29
Figure 3.4: XP Phases for Stario development 30
Figure 3.5: People that have melody stuck in their head 32
Figure 3.6: Time of melody ringing 33
Figure 3.7: The difficulty to search the melody 33
Figure 3.8: Sing/hum and search lyrics preference 34
Figure 3.9: Use Case Diagram for Stario user 35
Figure 3.10: Flowchart of login/registration of user 36
Figure 3.11: Flowchart of user to search song 37
Figure 3.12: Entity Relationship Diagram for Stario mobile application 38
Figure 3.13: Logical diagram for Stario mobile application 42
Figure 3.14: Illustration of analog-to-digital conversion 43
Figure 3.15: Sample coding for getFormat(), to change the audio into
digital 44
Figure 3.16: Sample coding for song match-making 45
Figure 3.17: The illustration of time-domain and frequency-domain 46
Figure 3.18: The example of FFT function 47
Figure 3.19: The example of before and after FFT analysis 48
Figure 3.20: Sample coding of fingerprinting 49
Figure 3.21: Sample coding of fingerprinting 50
Figure 3.22: Sample output in database for hashing 50
Figure 3.23: Example of records found in database 51
Figure 3.24: Example of coding for music identification 52

vii
LIST OF TABLES

Table 2.1: Comparison between music website and music application 9


Table 2.2: Differences between three mobile applications 13
Table 3.1: Hardware requirements 31
Table 3.2: Software requirements 32
Table 3.3: Table of use case description 35
Table 3.4: Interface design with description 39

viii
LIST OF ABBREVIATION

A.I Artificial Intelligence


ADC Analog Digital Conversion
API Application Programming Interface
CD Compact Disc
CSS Cascading Style Sheet
DBMS Database Management System
DFT Discrete Fourier Transform
DSM-5 Diagnostic and statistic manual, Fifth edition
EMD Empirical Mode Decomposition
ERD Entity Relationship Diagram
FFT Fast Fourier Transform
FT Fourier transform
GPS Global Positioning System
HHT Hilbert-Huang Transform
HTML Hypertext markup language
HTTP Hypertext Transfer Protocol
INMI Involuntary Musical Imagery
IOS i-Operating System
ISP Internet Service Provider
MaaS Music as a Service
MP3 MPEG Audio Layer-3
NTSC National Television System Committee
Qbh Query by Humming
SDK System Development Kit
SDLC Software development Life Cycle
SVCD Super Video Compact Disc

ix
VCD Video Compact Disc
W3C World Wide Web Consortium
WT Wavelet Transform
WWW World Wide Web
XP Extreme Programming
Y2K Year 2000 Problem

x
CHAPTER 1

1 INTRODUCTION

This chapter will concentrate on the features of the project. It provided the details
of Stario application for young people with using Voice Recognition system, and Query
by Humming (QbH) system. This chapter also covers the project background, problem
statement, aim, objectives, and significance that led to this application development.

1.1 Project Background

Music is one of the arts that can be produce by arranging sounds, with the applied
element which are melody, harmony, rhythm and timbre (Wikipedia). It is a way to
express feelings and an application to expand people’s creativity since its discovery in
Paleolithic Age. From the old century musician like Beethoven, Bach and Mozart until
modern age like Frank Sinatra, Michael Jackson and Jimi Hendrix had gave user the
pleasure and also broad our mind into creativeness. With many wise people and
geniuses born into the world, music has divided to many branches to what user called
nowadays as a genre. The most popular genre that had been listened to these days are
rock, pop, RnB, and hip-hop (Nielson Music). These genres in particular were also been
divided into many types, and still yet more to be track down. As there are many songs
right now being produce, there are also people that have a problem to recognize music
that has been stuck into their head. This problem is called earworms, which are formally
benign form of rumination, the monotonous, intrusive thoughts associated with anxiety
and depression. According to Brown H., there are 92 percent of population that
suffering from the disease. Therefore, the main purpose of developing Stario mobile
application was to reduce this type of disease and help people find their desire song with
only use their voice, specifically humming and singing.

Since 2008, many of the people were fascinated with the technology developed by
Marvel’s superhero, Ironman or Tony Stark, which is J.A.R.V.I.S. The technology
developed as a computer interface for instance, later upgraded to enhance artificial
intelligence (A.I.) system. As people that lived in 21st century era, user saw A.I as an
improvement to some system that has been developed throughout years and decade. For

1
example, J.A.R.V.I.S. can regularly communicate and act as a personal assistant with
Tony Stark, including business and global security.

Voice recognition is the ability of a machine or program to receive and interpret


dictation or to understand and carry out spoken commands (Scardina). Voice
recognition has widely use nowadays and been utilized its functionality to the high
technology such as Apple’s Siri, Amazon’s Alexa, and Microsoft’s Cortana. Beside
using only your fingers to surf internet, user can just instruct command our device to
search anything that user desire, enabling hands-free requests, reminders and other
simple tasks. The process of voice recognition essentially evaluates the biometrics of
our voice, including the frequency and the flow of the voice. The voice later will break
up into segments of several tones. Artificial Intelligence (A.I) and machine learning are
the forces of behind speech recognition. A.I is used to understand the colloquialisms,
abbreviations, and acronyms user use. Machine learning then pieces together the
patterns and develops from this data using neural networks. Voice recognition had
helped most of the population ease their daily routine and keeping the work on tracks.
Therefore, it is reasonable to be taken as an opportunity to develop Stario and help
people to find songs by only using voices.

Query by Humming (QbH) is a music retrieval system that branches off the original
classification systems of title, artist, composer and genre. It normally applies to songs
or other music with a distinct single theme or melody. Formally user will hum a piece
of tune into microphone connected to the mobile phone or computer. The system then
will search a database of tune to find list of melodies that are similar to the user’s
“query”. Next, the result will be produced and user may need to check whether the song
is correct based on the hum inputted. Although, there will be a case where the song was
not the user meant to search, likely the hum was off tune and the database does not
contain that tune, or the system is not intelligent enough to tell whether two tunes sound
similar. QbH is a particular case of “Query by Content” in multimedia databases. Most
research into “query by humming” in the multimedia research community uses the
notion of “Contour” information. Melodic contour is the sequence of relative
differences in pitch between successive notes. It has been shown to be a method that the

2
listeners use to determine similarities between melodies. However, the inherent
difficulty the contour method encounters is that there is no known algorithm that can
reliably transcribe the user’s humming to discrete notes.

1.2 Problem Statement

Based on observation, research and people’s survey. There are three problems
that can be identified according with the songs that can be hard to find. Below are the
problems that they had to face.

1.2.1 Having a rough time to remember the lyrics or the song title of
an artist
Sometimes, for certain people it can be hassle to recall back what lyrics that was
played into radio or random tracks from our phone. This can be a case where people
suffer from short term memory lost (Zimmermann, 2017). Although it is quite common
for people to forget lyrics of a song, that perhaps played so randomly, or a new released
song that’s popped up from car’s radio. Furthermore, most people can only remember
the melody and harmony of the song. The melody can be utterly catchy regarding the
repetitive instrumentals from the song itself. Recognizing the artist also makes it very
difficult for people to know what was the song that played in few minutes ago. The song
would be singing by a popular artist that was on hiatus in a long time, later making
comeback and release a new track. When people want to type in the lyrics of the song
to search for it, the result may be not the expected one because of certain wrong words,
and may took a very long time to find.

1.2.2 Song or melody that stuck into their head


There may be sometimes a melody can get stuck into our head for a long period
of time. This phenomenon can be called as involuntary musical imagery (INMI), more
widely known as “earworms”. According to a study from Durham University lead by
Jakubowski K., earworms are an extremely common phenomenon and an example of
spontaneous cognition. It stated that 40 percent of our days is spending on thinking
about random things and starting to try to understand about the brains mechanism which
is the thoughts that was unrelated with current tasks. Earworms are mostly happened

3
with pop music, following with top chart song such as “Bad Romance” by Lady Gaga,
“Don’t Stop Believing” by Journey, “Can’t Get You Out of My Head” by Kylie
Minogue, study shows. Even though earworms are not considered dangerous, it may
cause stress or obsession to a mental health issues patient, and in rare cases are
experienced during migraine and epileptic attacks.

1.2.3 Always get the non-similar result when searching for a song
Searching for a song can be impossible for some people. With the only melody
that stuck in their head as a clue, user tend to search by certain words of cryptic lyric,
the genre or perhaps nationality. With using Stario, people can get a result that is precise
by 90 percent without having to search and entering clueless query into the internet.

1.3 Project Aim


The aim of this project is to evaluate the effectiveness of voice recognition system
in Stario mobile application, with using analog to digital conversion and query by
humming to help people find the songs that they desire by humming or singing.

1.4 Project Objective


The objective of this project is:

a) To design Stario mobile application: using analog to digital conversion method


to change humming into digital frequency.
b) To develop Stario mobile application: using query by humming method to
match and retrieve the song from the user input query.
c) To evaluate the accuracy of analog to digital conversion and query by humming
method in songs matching.

4
1.5 Project Scope
The target user of this mobile application is consisting of three people that have
different voices. From the survey conducted, 74% of people are listening to pop music,
while 64.5% of them are listening to rock music. Therefore, this mobile application will
focus more on retrieving pop and rock music from the humming. There will be 10 songs
of the different genre, mainly pop and rock to be test during the evaluation. To use the
application, the three people will hum or sing into the microphone of a device, which
already been install with the apps. The hum or sing act as an input. Then the input will
be search within the dataset to get a song that sound similar. The result will be produced
with 10 seconds of playback, the name of the song, artist and genre.

1.6 Project Significance


Upon the completion of this project implementation, there is three expected
outcome that will be produce.

1.6.1 The system may solve the earworms problem


People can get tired of a looping melody that was stuck into their head, and may
take a longer period to finally being done with it. This application is meant to satisfy
people that have earworms problem, without them being discombobulate to think about
the same song every day.

1.6.2 Easier to find song in the future


Spending time to think about lyrics or song title may take an hour or more, even
if user get the idea of the song. Instead, having only 10 seconds of humming into the
microphone and retrieve the result in short moment can help people saves time and
effort.

1.6.3 Give benefit to the music industry


Some of the best songs were released in the radio or maybe in an advertisement.
Generally speaking, most people spend their time listening to radio and watch
advertisement on the television, but does not have an idea of what song was played.
People can just hum on the spot if they get the melody of it, without waiting for the
song or advertisement to play again.

5
1.7 Summary

This chapter provided all the information and define the objectives, problem
statement, scope and limitation, and significance of the project. The objective of
creating the application is to design Stario mobile application: using analog to digital
conversion method to change humming into digital frequency and developed with songs
matching using query by humming method to match and retrieve the song from the
user’s input. The other objective is to evaluate the accuracy of analog to digital
conversion and query by humming method in songs matching and the effectiveness of
the system into helping people find a song.

The problem statement that is extracted from the public survey which consist of
30 people and research from several materials, journal and study which to be happened
as difficulties of finding desire songs. The scope of this project that consist of three
different people that may spend 10 seconds to hum or sing for the song.

Finally, the significance of the project expected outcome is to solve the


earworms problem, making it easier to find a song and may give benefits to the music
industry, possibly in the future.

6
CHAPTER 2

2 LITERATURE REVIEW

Chapter two contains the literature review on article and information collected
using various method that is used for the developed Stario mobile application. Data and
information in the chapter two are gathered from books, journal, articles and other
sources. The information gathered were used in helping to elaborate and explain the
method and technique that would be implemented in the development. This chapter will
discuss the general information and overview of music services, mobile application,
human voice recognition and related works.

2.1 Music Services

2.1.1 Origin of Music Services

Since the beginning of human civilization, music has been one of the essential
for all human beings, be it from infant to elderly. The importance of music in our society
has led to creating an industry that includes all the concepts inherent to this thematic,
such as its organization, distribution, and profitability (Barata & Coelho, 2021). The
music industry has made up many records, hence monopolizing the production and
consumption of music.

Nowadays as user emerge into modern world, there are many music streaming platforms
been developed for public to listen music without having any causalities. Way back in
1999 when the Y2K bug had everyone gripped in a vice of fear, a peer-to-peer music
sharing website by the name of Napster started gaining traction amongst American
college students, who used the online service to share MP3 files of songs amongst one
another for free.

One of the most notable features of Napster was that it provided a platform for
music lovers to not only download albums for free, but also gain access to rare live
versions, alternate cuts, and demo version of their favorite artists. Regardless of their
popularity, many issues were reported towards the Napster. Only four months after the
website started, the Recording Industry Association of America filed a lawsuit against
Napster, and musical giants Metallica and Dr. Dre both filed lawsuits against the

7
website after unfinished versions of their tracks were leaked onto Napster in 2000
(Brewster, 2021).

Upon the demise of Napster, it begins to show that while peer-to-peer music
sharing was an extremely contentious practice within the music industry, online music
sharing was certainly a direction worth exploring. The new concepts of digital music
distribution have been established recently, e.g., Music as a Service (MaaS), in which
the content is not transferred and therefore differentiating itself from the well-known
download, thus promoting full-time access instead of physical property ownership.

2.1.2 Rising of Music Services in Internet


Music is somehow really important to the people, anytime and anywhere. It can
either lift up our mood, or cast down it. Music also can act as a therapy for people with
mental health, such as dementia. Dementia is an increasingly common syndrome and
while pharmacotherapy is available, its potential benefit is limited, especially in non-
cognitive outcomes (Lam et al., 2020). To be exact, according to American Psychiatric
Association’s Diagnostic and Statistic Manual, Fifth Edition (DSM-5), dementia is a
major neurocognitive disorder that is diagnosed when one or more cognitive domains,
such as complex attention, executive ability, learning and memory, language, praxis,
and social cognition, are impaired. Such health problems can be solved with great
music.

There are many platforms of music services that can be found, either offline or
online. One of the offline music platforms is radio. Radio plays a significant role in
informing, educating and enlightening the everyday public life. It also performs
entertainment role through music, drama, talk shows, live sports and other soft angles
that appeal to such societies (Rahman Ullah & Khan, 2017). An individual can live
longer until the modern generation, and still switching on their radio. It can be from the
car, mobile phone, or the radio device itself. Another founding of offline music is
hardware player. Meier & Manzerolle (2019) stated that it can be published as a vinyl,
compact disc or cassette. Although, the hardware method seems to be increasingly
delinked amid the rise of virtual format. Music-related industries are increasingly
focused on offering provisional access to catalogues of recorded music via streaming
services rather than ownership of recordings (Meier & Manzerolle, 2019). While the

8
online platform has numerous amounts of streaming services available. User can listen
to network site such as Apple Music, Spotify or YouTube that become popular, and
provide with new features that are in urban settings, and available to everyone for free
(Komulainen et al., 2010).

2.1.3 Music Website VS Music Application


The online music streaming can be considered as an e-services. E-services
(Electronic Services) is a general term that refers to services rendered through
information technologies via the Internet (Vatolkina et al., 2020). E-services involve a
broad range of activities that use the Internet as a distribution channel.

The Internet has become an integral part of modern society and economies
around the world (Kusumawati et al., 2019). It can be a medium for business people to
communicate and sell their product. Thus, the music industry took this opportunity to
introduce and sell its music products both songs, video clips complete with the lyrics of
the song through Internet. The publishing of music production can be either on the
website or application. Throughout the years, the discussion of website and application
giving a huge impact onto the developer. The advantages and disadvantages of both
methods frequently talked about, and being compared. Table below shows the
comparison between music website and music application:

Table 2.1: Comparison between music website and music application

Difference Music Website Music Application

User friendly Moderate High

Remote use Cannot use anywhere Can be played anywhere

Usage of High Moderate


Internet

Downloadable No Yes

Security Moderate High

9
Based on the table, user can see the music application has greater advantages rather than
music web. Even so, there may be some number of flaws of the music application
compared with music web. Therefore, user can conclude that music application is the
best choice to be develop.

2.2 Mobile Application


Mobile application is a modern platform of technology that can act remotely. In
other way, mobile application can be used anywhere, and mostly saves people time.
Nowadays, so many people are using mobile application to contact friends, browse
internet, file content management, document creating and handling, and entertainment
(Mazumder et al., 2018). This shows that having a mobile phone is an essential for every
individual, and not only for leisure purpose. Mobile application can be developed in
many platforms, and two of the most well-known these days are Android (Google Play
Store) and IOS (Apps Store) (Holzer & Jan, 2009). With the increase of mobile
application downloaded as each day pass by, the amount of mobile application
development also increases as mobile application is also used for other opportunity such
as education and business. The mobile application can be developed in three well-
known approaches; web, native and hybrid. Here user will discuss the details about the
approaches, comparing them and choose which one of it is suitable for this project.

2.2.1 Web Mobile Application Development


Web is where people use the internet provided by our Internet Service Provider
(ISP) to surf internet and use it to find what the user wish to find. Whether it is a picture,
video or word, all of it can be found in the web just by entering the correct keyword.
Tim Berners-Lee, a founder of W3C/WWW stated that web use a communication
protocol that is known as Hyper Text Transfer Protocol (HTTP) which are used to
connecting people across the globe as long there is internet connection. A web
application is a kind of mobile application based on pure web technologies running in
a browser to simulate native application known as native development following the
traditional framework (Wang et al., 2020). Most web development are using HTML
with CSS to view the data and the web content, while the interaction with client use
JavaScript.

Web mobile application is based on smart phone or any devices that can access
web browser on it. Sometimes, a problem might occur when user try to access a mobile
application in the play store or apple store. This is where web mobile application can

10
act as a substitute for the mobile apps. Besides that, web mobile application can also be
launch on any platform as long there is an installed browser on the device since it is
independent.

The most commonly used web-based application used three-tier architecture.


The architecture represents the client, the server and the Database Management System
(DBMS) which will be combined in a web application. Each tier has their own
responsibilities where one 19 tier responsible for user and system interface, another
handles the business logic and the last one represented the data storage. The well
scalable and manageable in the three-tier architecture is the factor as why it is chosen
in the web-based application. Web mobile application is based on smart phone or any
devices that can access web browser on it. Although web-based application is scalable,
the portability and the device access are lower compare to the other two types of mobile
application. Thus, removing the option of web browser from being picked for this
project development.

Figure 2.1: Three tier architecture

Source: Melo, 2011

11
2.2.2 Native Mobile Application Development

Native applications are built using the native language of the platform or device
it is intended to be used on (Mohammadi & Jahid, 2016). For instance, a mobile
application is developed for Android platform. Thus, other OS like IOS platform cannot
download the application. There are different SDKs for each platform and different
tools, APIs and devices with different functionalities on each platform (Pinto &
Coutinho, 2018). Native development is usually depended on how much the developer
want to link the application with the target 18 operating system. This is because, some
of the capabilities that exist in certain OS does not exist in others OS. Although native
apps only work for certain mobile operating system, there are still advantages for native
mobile application such as it performed faster on the device that the native apps work
with because of the device built in features just like apple product. Since the target of
this project development focus on android platform only, choosing Native mobile
application for development is the best option since it scales with the scope and the
application performance can be increase further. Further discussion for why Native
mobile application is chosen will be discussed in chapter 2.2.4 later.

2.2.3 Hybrid Mobile Application Development

Hybrid development combines the best of both the native and HTML5 worlds
(Mohammadi & Jahid, 2016). The development of hybrid includes of traditional web
developing programming languages such as HTML5, JavaScript and CSS. According
to Kaczmarczyk (2021), hybrid application can be run on many platforms, which
eliminates necessity of implementing more than one version of an application. This is
differed from native approach, that only support on one platform that was meant to be
developed. Moreover, hybrid application can access the features that built on the
platform, such as GPS, camera and location like native. This approach forces the rewrite
of all applications to match the different operating system (Pinto & Coutinho, 2018).
Although, hybrid sound more reliable compare to other two type of development, the
scales for hybrid mobile application is too big and does not fit with the scope of this
project where it focused only in a single platform development. Therefore, hybrid is not
chosen for the development of the project

12
2.2.4 Comparison of Web, Native, and Hybrid Mobile Application

Each mobile application type has its own advantage and disadvantage. For
developers, there are many aspects need to be taken into consideration before they
choose which type of development they wanted to do. Because of that, a clearer
understanding between all the mobile application development is needed. Table 2.2
shows the difference between three mobile applications in a table for easier comparison.

Table 2.2: Differences between three mobile applications

Feature Native Hybrid Web

Development Native only Native and web or Web only


language both

Code portability and High Medium Low


optimization

Advance graphic High Medium Medium

Upgrade flexibility Low Medium High

Speed Very fast Native speed Fast

App store Yes Yes No

Device Access Full Full Partial

2.3 Human Voice Recognition


Every people were born with unique voices. User were given with sound with a
high pitch as an adolescent, into a hoarse voice when reaching puberty. Some of us also
born with God-giving talent, which is the ability of singing and humming. Despite that,
the rising of artificial intelligence technology does not discriminate the voices, as long
as the information is well heard and clear when user spoke. Thus, this chapter will
discuss further about the voice recognition.

13
2.3.1 Definition of Human Voice Recognition
Voice recognition system is a system which is used to convert human voice into
signal, which can be understood by the machines (Hansen et al., 2017). The history of
speech recognition started in the 1940s. The “Audrey” system, designed by Bell
Laboratories was the first speech recognition system that could only understand digits.
As the time passed, the device was enhanced to recognize spoken word, to obtain ASR
(Automatic Speech Recognition).

The voice recognition system is divided into number of classes, which is:

1) Isolated Speech

Isolated words usually involve a pause between two utterances; it doesn’t mean that, it
only accepts a single word, but requires one utterance at a time.

2) Connected Speech

Similar to isolated speech, but allows separate utterances with minimal pauses between
them.

3) Continuous Speech

Allows the user to speak almost naturally, also called computer dictation.

4) Spontaneous Speech

Can be thought of as speech at a basic level, that is natural sounding and not rehearsed.
An ASR system with spontaneous speech ability have to be able to handle variety of
natural speech features such as words being run together, “ums” and “ahs”, even the
slightest stutters.

These number of classes are designed to work on their specific function, means some
of the system developed for voice recognition does not necessarily need to apply a
difficult and hard codec recognition system.

Chandolikar et al. (2022) stated that several of applications has been developed in order
to ease human activities, which are:

14
1) Audio Classification

Widely and well-known, and entails designating a voice to one of several different
classes, purposely to determine the sound’s kind or origin. The system might recognize
one sound to different possibilities, for example sound of car starting, dog’s barking or
siren.

2) Audio Separation and Segmentation

Audio separation is a technique of extracting a desired signal from a variety of sounds


so that it may processed deeper (e.g., isolation of guitar sound of a live concert), while
audio segmentation refers to process for highlighting important parts of an audio signal
(e.g., identify the sound of human’s heart and monitor irregularities).

3) Music Genre Classification and Tagging

Categorization of music based on the genre played with their songs, for example rock,
ballad, pop and hip-hop.

4) Music Generation and Music Transcription

Capture some acoustic properties and appends them to make a music record with all of
the musical elements & tones in the song.

5) Voice Recognition

Spoken words from the user can determine the gender, race, identity or name. The
voices also can determine user’s emotions.

6) Speech to Text and Text to Speech

Involves not only acoustic analysis, but also NLP (Natural Language Processing). This
process required the development of understanding basic language skills in order to

15
distinguish separate words from voiced noises. Applications like Siri’s Apple and
Alexa’s Amazon are a few that achieved the good functionality of the system.

To have a better understanding of voice recognition system, Figure 2.2 shows


the general block diagram of speech recognition system. A/D (Analog-to-Digital)
approach is used to decode the voice input, which in this case of project is humming or
singing. Chapter 2.3.2 will discuss details about the process of A/D.

Figure 2.2: General block diagram of speech recognition

Source: Hansen et. al. (2017)

2.3.2 Analog-to-Digital Conversion

Analog-to-digital converters (ADCs) are an essential component of devices that


process analog signals digitally (KALKHORAN, Sara; BENOWITZ, Neal L .;
RIGOTTI, 2018). Digital systems have benefited greatly from scaling, but analog
systems have become increasingly difficult. As a result, ADCs are often the main
bottleneck in systems, both in terms of power consumption and space requirements, and
in terms of system output quality. Therefore, developing a more efficient ADC is very
interesting. Figure 2.2 shows a great example of what analog and digital signals look
like.

16
Figure 2.3: A continuous signal (analog) turning into a digital signal

Source: Gudino, 2018

The ADC’s sampling rate, also known as sampling frequency, can be tied to the
ADC’s speed. The sampling rate is measured by using “samples per second”, where the
units are in SPS or S/s (or if you’re using sampling frequency, it would be in Hz). This
simply means how many samples or data points it takes within a second. The more
samples the ADC takes, the higher frequencies it can handle. The important equation
on the sample rate is:

fs = 1/T

Where,

fs = Sample Rate/Frequency

T = Period of the sample or the time it takes before sampling again

17
For example, in Figure 2.3, T appears to be 50 ms, while fs appears to be 20 S / s (or 20
Hz). The sampling rate is very slow, but the signal was output just like the original
analog signal. This is because the frequency of the original signal is as slow as 1 Hz.
That is, the frequency rate was sufficient to reconstruct a similar signal.

There will be few cases where the sampling rate will be considerably slower.
Knowing the ADC sample rate is important because user need to know if it causes
aliasing. Aliasing means that when a digital image / signal is reconstructed, it differs
significantly from the original image / signal generated by sampling. If the sample rate
is slow and the frequency of the signal is high, the ADC will not be able to reconstruct
the original analog signal, causing the system to read incorrect data. A good example is
shown in Figure 2.4.

Figure 2.4: An example on how aliasing happens

Source: Gudino, 2018

In this example, we can see where the sampling takes place on the analog input
signal. The digital signal output never approaches the original signal because the
sampling rate is not high enough to catch up with the analog signal. This causes aliasing,
and digital systems lack the full picture of analog signals.

A rule of thumb for determining if aliasing is occurring is to use the Nyquist theorem.
According to the theorem, the sample rate / frequency must be at least twice the

18
maximum frequency of the signal to restore the original analog signal. The following
equation is used to find the Nyquist frequency:

fNyquist = 2fMax

Where,

fNyquist = Nyquist frequency

fMax = The max frequency that appears in the signal

For example, if the maximum frequency of the signal input to the digital system
is 100 kHz, the ADC sample rate should be at least 200 kS / s. This allows the original
signal to be successfully reconstructed.

Also note that the signal may be interrupted because external noise introduces
an unexpectedly high frequency into the analog signal and the sample rate cannot handle
the added noise frequency. It is recommended to add an anti-aliasing filter (low pass
filter) before starting ADC and sampling. This will prevent unexpected high frequencies
from entering the system.

The resolution of the ADC can be related to the accuracy of the ADC. The
resolution of the ADC is determined by its bit length. Figure 2.4 shows a simple
example of how a digital signal can help user output a more accurate signal. Here we
can see that there are only two "levels" per bit. Increasing the bit length raises the level
and makes the signal more faithful to the original analog signal.

19
Figure 2.5: Example on how resolution affects the digital signal

Source: Gudino, 2018

Knowing the bit resolution is important if the system requires an accurate


voltage level to read. The resolution depends on both the bit length and the reference
voltage. These equations help find the overall resolution of the signal input to the
voltage value.

Step Size = VRef/N

Where,

Step Size = The resolution of each level in terms of voltage

VRef = The voltage reference (range of voltages)

N = Total level size of ADC

To find N size, use this equation:

20
N = 2n

Where,

n = Bit Size

For example, suppose a need to read a sine wave with a voltage range of 5. The
bit size of the ADC is 12 bits. If you connect 12 to n in Equation 4, N becomes 4096.
With this in mind, if we set the voltage reference to 5V, then the step size = 5V / 4096.
we can see that the step size is about 0.00122V (or 1.22mV). This is accurate because
the digital system can detect the voltage change with an accuracy of 1.22 mV. If the

ADC has a very short bit length (for example, only 2 bits), the accuracy drops
to 1.25V. This gives very bad results as it only gives the system four voltage levels (0V,
1.25V, 2.5V, 3.75V). And 5V). Figure 2.6 shows common bit length and their number
of levels. It also shows what the step size would be for a 5V reference. We can see how
accurate it gets as the bit length increases.

Figure 2.6: Bit length and their number of levels and step size for a 5V reference range

Source: Gudino, 2018

21
2.3.3 Query by Humming

In music identification, there is a method called “Query by Humming” (QbH),


first introduced by Asif Ghias, Cornell University in Ithaca, USA in 1995 (Jaczyńska et
al., 2018). This is the first full query of the buzz system, which includes processing the
input audio signal to extract the information needed for the query. It contains 183 songs
in the database, taken from MIDI files. People often have a melody that stuck for days
and cannot let it out, and cannot search about the melody’s metadata in the Internet by
words. They will go to the music store workers that have huge knowledge about music
and describe the tune by humming it. This case may cause a waste of time to the people
and incorrect song.

According to Jaczyńska et al. ( 2018), the song is recognized in the following


steps:

• Recording using microphone or a melody hummed or whistled by the user.

• Transforming the melody into a string of information about changes in pitch,


so-called “melodic contour”, using the sound pitch tracking algorithm.

• Comparing the melodic contour with contours saved in the database.

• Displaying the list of the most similar objects.

Also, different types of algorithms are used to detect the sound, which is divided into:

- Times, based on detecting and counting characteristics related to the


repeatability of a signal over time, or looking for similarities between a
signal and its delayed version,

- Frequency, based on the signal converted to spectral form, which helps to


determine the first harmonic, the greatest common divisor of all harmonics
or other determinants of the period.

- Hybrid, the combination of both at the same time

22
Musical signals often exhibit richness and indeterminacy and complex dynamic
structure. The mathematical transform such as Fourier Transforms (FT) and Wavelet
Transforms (WT) (Nagavi & Bhajantri, 2018) can captures the structural information.
However, they are not adept for capturing music signal’s non-stationarity and
dynamisms. Another transform, Hilbert-Huang Transforms (HHT) in the form of
Empirical Mode Decomposition (EMD), are able at dimension reduction and
quantifying the complex structures and dynamism in music signals. Figure 2.7 shows
the illustration of proposed QbH system.

Figure 2.7: Proposed Query by Humming (QbH) System

Source: (Nagavi & Bhajantri, 2018)

23
2.4 Related Works

2.4.1 Shazam
Released in 2002, Shazam has been a biggest music recognition service that is
ever developed. Before released as a legitimate system, user has to calls up Shazam
service center using their mobile phone, with 15 seconds of an audio obtained from
played music. The identification later made on the sample at Shazam server, then the
track title and artist will be sent back via SMS text messaging (Hussain et al., 2017).
User can register and log in with a mobile phone number and password on the website
to retrieve the information. On a desktop or smartphone, they may view their tagged
track list and buy the CD. The tagged track is downloadable as a ringtone if is available.
The 30-seconds clip of the tagged song also can be sent to their friends.

The method Shazam use to recognize the exact track is with the preprocessing
that creates fingerprints (Xiao, 2019). From a spectrogram of a signal, it will be use to
determine relative peaks, and plots the peaks as a cleaner version of the spectrogram. A
spectrogram is a plot of frequency over time, and the designation of “relative peaks” is
not specified. Figure 2.8 shows the illustration of the process for music recognition by
Shazam.

Figure 2.8: The preprocessing that creates fingerprints

Source: Xiao, 2019

Next, Shazam chooses a large set of anchor points throughout the song, and
creates a pair with each anchor point and a set of neighboring points. The neighboring
points are all of the points within an area following the anchor point, withing some
range of time and within some range of notes above and below the anchor point. All of
the pairs for a song are then each made into a hash, or fingerprint, and saved in the
database along with their absolute time of occurrence for later use. The average number
of pairs per anchor point is referred to as the fan-out factor.

24
During searching process, the same technique is used to generate hashes for the
query, Then, for each song containing a matching hash, the matching hashes are
graphed: query time on the vertical axis, song time on the horizontal axis, and a dot
plotted on each hash. If a diagonal line of dots appears, meaning that the hashes of the
query coincide with the song sequentially during that time period, the song is considered
a result. For a non-result, only a scattering of matching hashes will occur.

The system of Shazam only works for a real tracks and audio. As long as the
audio is detected from an artist or producer, the desired result will be retrieved.
Shazam’s goal is not the same as the Stario, which is to get the people’s humming and
turn it into the searched song details.

2.4.2 SoundHound

Another music recognition application that available nowadays is SoundHound.


Founded in 2005, SoundHound has develop many of the application regarding with
voice recognition. SoundHound use the “Speech-to-Meaning” approach, which is
advanced voice AI platform that was developed by Houndify. SoundHound can be used
to identify a song with humming or singing, which is similar to Stario purpose.

2.4.3 Musixmatch

Musixmatch is an application that merge songs and lyrics. It gives listeners the
ability to read lyrics from the screen of the listener's device, Android or PC. If a song is
playing and the lyrics are on, an image of the band or song will appear with the lyrics.
Musixmatch is a global lyrics provider that provides its service in 20 different
languages. Moreover, he has developed a top-of-the-line application for mobile phones,
desktops or tablets and he has created a powerful application programming interface
(API) that can be used with any website or application. It is the world's largest official
lyrics catalog that allows developers and music fans everywhere to tap into the power
of online lyrics quickly and easily. It allows anyone to easily plug in and distribute
authorized lyrics. It is fully compatible with many music apps for Android like Winamp,
Google Music, WIMP, iTunes, Archos Music Player or Spotify. We can listen to music
through these applications and even continue to play the lyrics from the application.

25
2.4.4 Discussion of related works

Many applications regarding of music identification is developed, and each of


them has their significant functions and procedure on how to use. Different of
algorithms are used to make sure the applications work accordingly with their purpose.
With the discussed related works, it can be use as a benchmark to develop Stario, and
hope to be successful in functionality and effectiveness.

26
CHAPTER 3

3 PROJECT METHODOLOGY

This chapter explain about the project methodology which provide easy to
understand and a better view of how the project have been developed. The activity of
each phase of the chosen methodology were provided as a guidance in planning and
constructing the overall project.

3.1 Project Methodology

Figure 3.1: SDLC Illustration

Every system or development software has each own software development


methodology. A Software Development Life Cycle (SDLC) model is a method where
software can be developed in a way where it is systematic by providing a framework
that show the sequence of activities and description on each of its phase. Figure 3.1
shows the common phase of SDLC.

There are many of SDLC model that can be used as a guide to develop a software
or system. For this project, the project methodology that has been implemented is based
on one of the agile methods, which is Extreme Programming (XP) methodology. XP
methodology is a software development methodology that’s part of what is collectively
known as agile methodologies.

27
XP is built upon values, principles, and its goal is to allow small to mid-sized
teams to produce high-quality software and adapt to evolving and changing
requirements. It is a lightweight methodology that has gained increasing acceptance
and popularity in the software community. XP promotes a discipline of software
development based on principles of simplicity, communication, feedback, and courage
(Kircher et al., 2001).

XP is meant to be implemented on this project because there will be cases where


the testing of the system has sort of errors and undesired outcome, then repeat the coding
steps to check and improvise based on the errors.

This way of works suited for this project development since the development of
system require repeat testing to get the absolute result. For example, if the humming
testing is not accurate with the dataset from QbH, coding and maintenance needs to be
supervised well, regarding with the frequency and the analog-to-digital conversion
method. Figure 3.2 shows the XP illustration on how the works done.

ExtremeProgramming
Figure 3.2: Extreme ProgrammingPhase
Phase

28
3.2 Project Framework

This project framework is based on XP model, and the five phases consist of
planning, designing, coding, testing and listening. Figure 3.3 shows the framework for
this project and Figure 3.4 shows the XP lifecycles of summary on details regarding the
phases of Stario mobile application based on the XP model.

Figure 3.3: XP Model Phases for Stario

29
Figure 3.4: XP Phases for Stario development

30
3.3 Planning

Taking a first step into the project development, planning is essentially required
to meet the goals. All the information taken from the online survey will be made into
user’s stories, and then turn it into iteration. The project is divided into iterations, and
iteration planning initiates each iteration. The next step is the gathering of information
which was used to complete this project development which is gathered and collected
through journal, articles and other sources which result with literature review. In the
literature review, all information including technique and software that was used in the
development is studied in order to fully understand the functional specification of each
module that will be exist in Stario Mobile Application. During this step also, the related
work and existing application is used as a benchmark to identify the use case diagram
for this project

3.3.1 Hardware requirements

Hardware requirement is the component that was used for the system
development in term of physical aspect. In this project, the hardware components that
are used is the personal computer. Table 3.1 shows the hardware requirement for this
project.

Table 3.1: Hardware requirements

No Hardware Description
1 Lenovo Ideapad 320 OS: Windows 10 Pro (64 bit)
Processor: AMD A12-9720P RADEON R7, 12
COMPUTE CORES 4C+8G 2.70 GH
RAM: 8 Gb

31
3.3.2 Software requirements
To complete the project development, a software is required to ensure the
hardware component can perform and functioning with efficiency way during the
development. Table 3.2 shows the software requirements.

Table 3.2: Software requirements

No Software Description
1 Flutter Flutter is used to create the framework and
interface for the application to run
2 Java Development Kit This is the kit that was used to develop the java
application and applets during the development
3 Draw.io Draw.io is an open software to create Use Case
Diagram, Flowchart and Entity Relationship
Diagram
4 NoSQL A database that use to store user and song details

3.3.3 Survey analysis


Survey has been distributed to 30 public respondents. From the survey
conducted, there are two problems that can be identified. The first problem is regarding
of the melody that stuck in the respondent’s mind, while the second problem is about
the difficulty of searching the melody.

Figure 3.5: People that have melody stuck in their head

32
Figure 3.5 shows that 68.8% of public respondents have problem with melody
stuck in their head while 31.3% of the rest is not commonly having the problem. The
problem needs to be identified to make sure the percentage is reduced at least below
50%.

Figure 3.6: Time of melody ringing

Figure 3.6 shows that 51.6% of the public respondents is having melody stuck
in their head for at least 5-10 seconds. The second highest is at 25.8% which is less than
5 seconds, and last one is 22.6% which is more than 10 seconds. With the percentage
we can deduct that the scope of the project must be at least 10 seconds of humming
when user want search the song.

Figure 3.7: The difficulty to search the melody

33
Figure 3.7 shows that 78.1% of the public respondents having a difficulty to
search the melody in the internet, and 21.9% of the rest may assume to have the glimpse
of the song title. This difficulty may refer to not remembering the lyrics of the song.

Figure 3.8: Sing/hum and search lyrics preference

Figure 3.8 shows that 75% of the public respondents prefer to sing/hum to search
for song rather than 25% that want to search for the lyrics. This percentage will act as a
support for the development of Stario.

34
3.3.4 Use Case Diagram

Figure
Figure 3.9: 3.9:
Use Use
CaseCase Diagram
Diagram for Stario's
for Stario user U

The use case is analyzed to get the view interaction between the user and the
application. Based on the Figure 3.9, the user gets to sign up or login if they have the
existing account. They also have the ability to view and edit their profile on the
application. The most important thing in the application is to search the song by
humming or singing. Lastly user can look for their history search song to see the
previous search song.

Table 3.3: Table of use case description

Actors Description
User - User who has already registered the Stario
application
- User can be general, or simply anyone that have
problem to search the song

35
3.4 Designing

Application design phase is the next step after requirements analysis where more
details about the system and how the system flow works is illustrated. This phase is
important to ensure better understanding and view about Stario was developed. In this
phase, the Flowchart Diagram, Entity Relationship Diagram (ERD) and the graphical
interface is illustrated and will be explained.

3.4.1 Flowchart

Flowchart is use to show how the system and process work by showing the flow
of the process from beginning until the system ended by the users. There are 2 modules
for the user which is Register Module and Search Song Module.

Figure 3.10: Flowchart of login/registration of user


Figure 3.10: Flowchart of login/registration of user

36
Figure 3.10 shows the flowchart of user when they open the application. If the
user already had an account into the application, he/she does not need to register for the
new account and proceed with sign in. Meanwhile, the registration is needed for the
new user. After sign in user may enter the homepage.

Figure 3.11: Flowchart of user to search song

Figure 3.11 shows the flowchart of user when they want to search for their song.
User required to hum or sing into their microphone on the mobile phone. The system
then will change the audio into the digital signal, then match-making it with the dataset.
The song (output) will play for 10 seconds. If the song is incorrect user may want to try
hum/sing again.
37
3.4.2 Entity Relationship Diagram

Figure 3.12: Entity Relationship Diagram for Stario mobile application

Next is designing the Entity Relationship Diagram where all the entity of the
application system showed each relationship between them in the database. In ERD, the
relationship between entity consists of one to one, one to many, many to one and many
to many depend 78 on the condition and relation between the entity in the ERD. In
Figure 3.12, there are three tables which are User, Song and Song_History.

The User table have the userID as primary key which have one-to-many
relationship to Song and Song_History table. In this table user have the ability to view,
edit and delete their profile.

The Song table stores the details of the song, which is the title, artist, album and
their genre. The User will get their desire output from this table after they hum to search.

The Song_History table act as a bridge between Song table and User table, as
different user will have bunch of song to search. The Song_History will store the foreign
key from User and Song table’s primary key to be able store it as a store for recent
search song.

38
3.4.3 Interface Design

Table 3.4: Interface design with description

Interface Description
This is the first interface when user
installed Stario on their mobile phone.
User need to enter username and
password to use the Stario, or sign up if
there is no existing account.

The page will pop out when user click to


sign up. They have to enter their
username, password, re-enter password
and their email.

39
This is the homepage for the Stario. Here
user will tap the middle button to
hum/sing for searching the song.

After user hum/sing to the microphone,


the system will search for the similar
song in the dataset. Here is the sample of
output, and they can try again by
clicking the try again button if the song
is incorrect.

40
The search history will store the
previous result of user searched song.
User may want to clear the history if
necessary.

41
3.4.4 System Architecture

Figure 3.13: Logical diagram for Stario mobile application

In Figure 3.13, the figure shows the initial input is taken as a recording. The
input then will be captured and sample to be convert into digital signal. Later on, the
fingerprinting is conducted for match-making the sound with the song in the database.
The ISP will act as a intermediary that give internet connection in order to store the data
of the song and the user details.

42
3.5 Coding

For this project, we will focus on the coding which will apply ADC to sample the
audio from user’s hum/sing.

Recording devices mimic this process quite closely, using the pressure of the
sound wave to convert it into an electrical signal. The real sound wave in the air is a
continuous pressure signal. In a microphone, the first electrical component that
encounters this signal converts it to an analog voltage signal - again, DC. This
continuous signal is not too useful in the digital world, so before it can be processed, it
must be translated into a discrete signal that can be stored in digital form. This is done
by acquiring a numerical value representing the amplitude of the signal.

The conversion involves quantizing the input and it is bound to introduce some
small errors. Therefore, instead of a single conversion, an analog-to-digital converter
performs many conversions on very small parts of the signal - a process known as
sampling.

Figure 3.14: Illustration of analog-to-digital conversion

The Nyquist-Shannon Theorem tells us what sampling fee is essential to seize a


sure frequency in non-stop sign. In particular, to seize all the frequencies that a human
can listen in an audio sign, we ought to ought to pattern the sign at a frequency two
times that of the human listening to range. The human ear can hit upon frequencies more
or less among 20 Hz and 20,000 Hz. As result, audio is most customarily recorded at a

43
sampling fee of 44,100 hundred Hz. This is the sampling fee of Compact Discs, and is
likewise the maximum usually used fee with MPEG-1 audio (VCD, SVCD, MP3). (This
unique fee become at the start selected with the aid of using Sony due to the fact it may
be recorded on changed video device going for walks at both 25 frames consistent with
second (PAL) or 30 frames consistent with second (the use of an NTSC monochrome
video recorder) and cowl the 20,000 Hz bandwidth concept essential to suit expert
analog recording device of the time.) So, while deciding on the frequency of the pattern
this is had to be recorded, you'll likely need to go along with 44,100 hundred Hz.

3.5.1 Recording – capturing the sound

The user hum/sing will be captured with the following coding in Figure 3.14.
We will use Java programming language, set the frequency of the sample, number of
channels (mono/stereo), sample size (e.g., 16-bit sample). Then open the line from the
sound card and write to a byte array.

Figure 3.15: Sample coding for getFormat(), to change the audio into digital

The data will be read from TargetDataLine. In Figure 3.16, the running flag is
a global variable which is stop by another thread – for example, if user click the search
icon to stop recording.

44
Figure 3.16: Sample coding for song match-making

3.5.2 Time-Domain and Frequency Domain

What we have in this byte array is signal recorded in the time domain. The time-
domain signal represents the amplitude change of the signal over time.

Given that each component sinusoid has a certain frequency, amplitude, and
phase, Jean-Baptiste Joseph Fourier made the extraordinary discovery in the early 1800s
that any signal in the time domain is equivalent to the sum of some (potentially infinite)
number of simple sinusoidal signals. The original time-domain signal's Fourier series is
the collection of sinusoids that make up the entire signal.

In other words, any time domain signal may be represented by simply providing
the frequencies, amplitudes, and phases that correspond to each of the sinusoids that
make up the signal. The frequency domain is the name given to this illustration of the
signal. A static representation of a dynamic signal is provided by the frequency domain,
which functions in certain respects as a form of fingerprint or signature for the time-
domain signal.

45
Figure 3.17: The illustration of time-domain and frequency-domain

Many things are much simplified when analyzing a signal in the frequency
domain. Because the engineer can examine the spectrum—a representation of the
signal in the frequency domain—and ascertain which frequencies are there and which
are absent, it is more practical in the world of digital signal processing. Then, using
the provided frequencies, one can filter, alter some of the frequencies, or just
determine the precise tone.

3.5.3 The Discrete Fourier Transform

In order to move our signal from the time domain to the frequency domain, we
must develop a technique to do it. The Discrete Fourier Transform (DFT) is used in this
situation. A discrete (sampled) signal can be subjected to Fourier analysis using the
DFT mathematical methodology. By assuming that each sinusoid had been sampled at
the same rate, it transforms a finite list of evenly spaced function samples into a list of
coefficients for a finite combination of complex sinusoids, arranged by their
frequencies.

The Fast Fourier transform is one of the most widely used numerical methods
for DFT calculation (FFT). The Cooley-Tukey method is by far the most often used
FFT variant. This algorithm recursively splits a DFT into numerous smaller DFTs in a
divide-and-conquer strategy. When using a Cooley-Tukey FFT, the identical result can

46
be calculated in O(n log n) operations as opposed to directly evaluating a DFT, which
requires O(n2) operations. Figure 3.18 shows the FFT function.

Figure 3.18: The example of FFT function

47
Figure 3.19: The example of before and after FFT analysis

3.5.4 Music Recognition: Fingerprinting a Song

We lose a lot of information regarding timing as a result of FFT, which is a


negative side effect. (Though theoretically this can be avoided, there are significant
performance overheads.) We can see all the frequencies and their magnitudes for a
three-minute song, but we are unsure of their exact placement in the song. But this is
the crucial detail that gives the song its identity. We must somehow understand when
each frequency first arose.

To transform just this portion of the data, we utilize some sort of sliding window
or data chunk. There are several methods for determining the size of each chunk. One
second of a sound, for instance, will be 44,100 samples * 2 bytes * 2 channels 176 kB
if it is recorded in stereo with 16-bit samples at 44,100 Hz. In every second of the music,
44 chunks of data will be available for analysis if we choose a chunk size of 4 kB. That
density is adequate for the thorough analysis required for audio identification.

48
Figure 3.20: Sample coding of fingerprinting

In the inner loop we are putting the time-domain data (the samples) into a
complex number with imaginary part 0. In the outer loop, we iterate through all the
chunks and perform FFT analysis on each.

We may begin creating our digital fingerprint of the song once we are aware of
the frequency composition of the signal. The Stario audio recognition system's most
crucial step is this one. The key difficulty here is figuring out which frequencies, out of
the sea of frequencies collected, are the most significant. We instinctively look for the
frequencies with the largest magnitude (commonly called peaks).

However, in a single song, the strong frequency range could run from low C -
C1 (32.70 Hz) to high C - C8 (4,186.01 Hz). This is a huge time frame to cover.
Therefore, rather than examining the entire frequency range at once, we can select a
number of smaller intervals and examine each one separately. These intervals can be
selected based on the common frequencies of significant musical elements. For
instance, we could make use of the intervals this guy used to construct the Shazam
algorithm. For the low tones (covering, for instance, bass guitar), they are 30 Hz to 40
Hz, 40 Hz to 80 Hz, and 80 Hz to 120 Hz; for the middle and high tones, these are 120
Hz to 180 Hz and 180 Hz to 300 Hz (covering vocals and most other instruments).

Now within each interval, we can simply identify the frequency with the highest
magnitude. This information forms a signature for this chunk of the song, and this
signature becomes part of the fingerprint of the song as a whole.

49
Figure 3.21: Sample coding of fingerprinting

Assuming that the recording was not done in ideal circumstances (i.e., in a "deaf
room"), we must add a fuzz factor to the equation. Fuzz factor analysis needs to be
treated seriously, and in a practical system, the program should offer the opportunity to
modify this parameter in accordance with the circumstances surrounding the recording.

This signature serves as a hash table's key so that audio searches can be done
with ease. Along with the song ID, the matching value indicates when this particular set
of frequencies first appeared in the song (song title and artist). An illustration of how
these records would look in the database is shown here.

50
Figure 3.23: Example of records found in database

3.5.5 The Music Algorithm: Song Identification

To identify the melody that stuck in our head, we hum/sing into our phone, and
run the recording through the same audio fingerprinting process as above. Then we can
start searching the database for matching hash tags.

As it happens, many of the hash tags will correspond to the music identifier of
multiple songs. For example, it may be that some piece of song A sounds exactly like
some piece of song. Each time we match a hash tag, the number of possible matches
gets smaller, but it is likely that this information alone will not narrow the match down
to a single song. So, there is one more thing that we need to check with our music
recognition algorithm, and that is the timing.

We cannot only compare the timestamp of the matched hash with the date of our
sample since the sample we recorded by humming or singing could be from any point
in the song. To strengthen our conviction, we can examine the relative timing of the
matches when there are several matched hashes.

As you can see in the Figure 3.22 above, the hash tag 30 51 99 121 195 is
associated with both Song A and Song E. Another match for Song A might occur if,
one second later, we matched the hash 34 57 95 111 200, but in this instance, we are
certain that the hashes and time discrepancies matched.
51
Figure 3.24: Example of coding for music identification

3.6 Testing

Testing phase is the most important part for the development because this is where
the system or application will be tested whether it is functioning or not. The test that
was conducted is accuracy test. The aim for the test is to determine whether the project
can move to the final phase.

3.6.1 Unit Testing

As a developer we need to conduct a unit testing to make sure the algorithm is


functioning well. Each of the functionality will be test to check and measure the amount
of error. If there is an error regarding the coding, then the coding phase will be repeat
until the error is repaired.

3.6.2 Acceptance Testing

Acceptance testing is a quality assurance (QA) process that determines to what


degree an application meets end users' approval. Depending on the organization,
acceptance testing might take the form of beta testing, application testing, field testing
or end-user testing. For this project, the supervisor will be act as an end user to confirm
the functionality of the program.

52
3.7 Listening

Listening is all about constant communication and feedback from the customers
(supervisor). The customers and the project developer are involved to describe the
business logic and value that is expected. This communication will determine the
outcome of this project.

3.8 Summary

The project framework is crucial since it illustrates the development process in


detail for each step. To achieve the project's objectives and goals, each justification and
piece of information pertaining to the development is presented here. Each stage of the
project has its own flow and direction. The Stario framework consist of 5 steps starting
from planning, designing, coding, testing and listening. This chapter is mostly to show
the understanding on the method and technique that is being used and how it is
implemented to complete the Stario mobile application by following the framework.
All of this method and technique is gathered during the first phase through survey first
in order to get the problem statement and objective of the project to create an application
that have a significant based on the scope that is decided in chapter one.

In design phase, the graphic user interface for the application was designed
based on the method use which is sound recording. The design of the application use
Flatter software to complete the interface. After the design is done, the project
development starts and this phase is where all the method and technique were
implemented. The coding example will be implemented into the project with Java
programming language. Many discover algorithm will be use such as Discrete Fourier
Transform and Fast Fourier Transform. These algorithms are responsible to change the
sound recording into the digital signals for song match-making. It will be compared
with the existing database which is stored in NoSQL. While we code the program,
several unit testing are conducted to make sure the less errors from the algorithm. Then
the customer, which is the supervisor will do acceptance test to give the satisfaction of
the application functionality. The phase are consider complete when there is no
particular error, and can be release into the server. This clearly showed how the phase
in the framework is related and how important a framework is in developing a system.

53
References
R Jones, N. (2015, January 14). Nielson Study Reveals Rock Prevails As Most Popular
Genre In The US. Hypebot. Retrieved May 1, 2022, from
https://www.hypebot.com/hypebot/2015/01/nielson-study-reveals-rock-prevails-as-
most-popular-genre-in-the-us.html

Music. (2022, May 8). In Wikipedia. https://en.wikipedia.org/wiki/Music


Scardina, J. (2018, January 31). Voice Recognition (Speaker Recognition).
SearchCustomerExperience. Retrieved May 1, 2022, from
https://www.techtarget.com/searchcustomerexperience/definition/voice-recognition-
speakerrecognition#:%7E:text=Voice%20or%20speaker%20recognition%20is,Apple’
s%20Siri%20and%20Microsoft’s%20Cortana.

Brown, H. (2015, November 1). How Do You Solve a Problem Like an Earworm?
Scientific American. Retrieved May 1, 2022, from
https://www.scientificamerican.com/article/how-do-you-solve-a-problem-like-an-
earworm/?error=cookies_not_supported&code=9fbb58cf-7a39-4ddd-b1ce-
a6bad73d2953

A. (2018, May 3). What is Query by Humming? - ACRCloud. Medium. Retrieved May
1, 2022, from https://medium.com/acrcloud/what-is-query-by-humming-
9fb8e74e738a
Zimmermann, K. A. (2017, April 24). What Is Short-Term Memory Loss?
Livescience.Com. Retrieved May 1, 2022, from https://www.livescience.com/42891-
short-term-memory-loss.html

Welch, A. (2016, November 3). Psychologists identify why certain songs get stuck in
your head. CBS News. Retrieved May 1, 2022, from
https://www.cbsnews.com/news/psychologists-identify-why-certain-songs-get-stuck-
in-your-head/

54
Barata, M. L., & Coelho, P. S. (2021). Music streaming services: understanding the
drivers of customer purchase and intention to recommend. Heliyon, 7(8).
https://doi.org/10.1016/j.heliyon.2021.e07783
Chandolikar, N., Joshi, C., Roy, P., Gawas, A., & Vishwakarma, M. (2022). Voice
Recognition: A Comprehensive Survey. 2022 International Mobile and
Embedded Technology Conference, MECON 2022, 45–51.
https://doi.org/10.1109/MECON53876.2022.9751903
Experiments With the Shazam Music Identification Algorithm. (2019).
Hansen, G. C., Falkenbach, K. H., & Yaghmai, I. (1988). Voice recognition system.
Radiology, 169(2), 580. https://doi.org/10.1148/radiology.169.2.3175016
Holzer, A., & Jan, O. (2009). Trends in mobile application development. Lecture
Notes of the Institute for Computer Sciences, Social-Informatics and
Telecommunications Engineering, 12 LNICST, 55–64.
https://doi.org/10.1007/978-3-642-03569-2_6
Hussain, A., Mkpojiogu, E. O. C., Almazini, H., & Almazini, H. (2017). Assessing the
usability of Shazam mobile app. AIP Conference Proceedings, 1891(October), 1–
6. https://doi.org/10.1063/1.5005390
Jaczyńska, M., Bobiński, P., & Pietrzak, A. (2018). Music Recognition Algorithms
Using Queries by Example. Proceedings of 2018 Joint Conference - Acoustics,
Acoustics 2018, 108–111. https://doi.org/10.1109/ACOUSTICS.2018.8502429
KALKHORAN, Sara; BENOWITZ, Neal L .; RIGOTTI, N. A. (2018). 乳鼠心肌提
取 HHS Public Access. Revista Del Colegio Americano de Cardiologia, 72(23),
2964–2979. https://doi.org/10.1109/jstsp.2018.2863189.A
Komulainen, S., Karukka, M., & Häkkilä, J. (2010). Social music services in teenage
life - A case study. ACM International Conference Proceeding Series, April,
364–367. https://doi.org/10.1145/1952222.1952303
Kusumawati, R. D., Oswari, T., Yusnitasari, T., Dutt, H., & Shukla, V. K. (2019). A
Comparison of Service Quality on Customer Satisfaction towards Music Product
Website in Indonesia and India. 2018 International Conference on Sustainable
Energy, Electronics and CoMputing System, SEEMS 2018, 5–8.
https://doi.org/10.1109/SEEMS.2018.8687377
Lam, H. L., Li, W. T. V., Laher, I., & Wong, R. Y. (2020). Effects of music therapy
on patients with dementia-A systematic review. Geriatrics (Switzerland), 5(3), 1–
14. https://doi.org/10.3390/GERIATRICS5040062
Mazumder, T. A., Student, M. S., Light, F., Networking, S., & Players, V. (2018).
Mobile Application and Its Global Impact 1. 06, 72–78.
Meier, L. M., & Manzerolle, V. R. (2019). Rising tides? Data capture, platform
accumulation, and new monopolies in the digital music economy. New Media
and Society, 21(3), 543–561. https://doi.org/10.1177/1461444818800998
Mohammadi, F., & Jahid, J. (2016). Comparing Native and Hybrid Applications with
focus on Features. 49.
Nagavi, T. C., & Bhajantri, N. U. (2018). A new approach to query by humming

55
based on modulated frequency features. Proceedings of the 2017 International
Conference on Wireless Communications, Signal Processing and Networking,
WiSPNET 2017, 2018-Janua, 1675–1679.
https://doi.org/10.1109/WiSPNET.2017.8300046
Rahman Ullah, P., & Khan, A. U. (2017). Role of FM Radio in Education (A Case
Study of FM Radio in Peshawar). J. Soc. Sci, 3(3), 9–16.
Vatolkina, N., Gorbashko, E., Kamynina, N., & Fedotkina, O. (2020). E-service
quality from attributes to outcomes: The similarity and difference between digital
and hybrid services. Journal of Open Innovation: Technology, Market, and
Complexity, 6(4), 1–21. https://doi.org/10.3390/joitmc6040143
Wang, Z., Cheng, B., & Chen, J. (2020). Enabling Ordinary Users Mobile
Development with Web Components. IEEE Access, 8, 1767–1776.
https://doi.org/10.1109/ACCESS.2019.2962393

56
57

You might also like