Professional Documents
Culture Documents
FERC/BERC 1
This application is for the purpose of obtaining approval to conduct research involving
the human subject.
Please attach a copy of the Research Proposal.
Permohonan ini dikemukakan untuk tujuan kelulusan menjalankan penyelidikan melibatkan
manusia sebagai subjek kajian.
Sila lampirkan salinan kertas Cadangan Penyelidikan.
Does the research require an external Research Ethics Committee approval? (e.g.
MREC)
Adakah penyelidikan ini memerlukan kelulusan Jawatankuasa Etika Penyelidikan Luaran?
(contoh MREC)
☐ No / Tidak
Page 1 of 15
F/BERC 1 (2021)
Part B1
Bahagian B1
☐ Interviews ☐ Case study
Temubual Kajian kes
Part B2
Bahagian B2
1. Background:
Latar belakang:
A brief explanation of the problem to be studied and literature review (citations) to support.
Keterangan ringkas tentang masalah yang dikaji dan soroton kajian (sitasi) untuk
menyokong keterangan tentang masalah yang dikaji.
According to Brown (2015), there are 92 percent of population that suffer from
earworms disease. Earworms can be described as a situation where a melody stuck
in the head. If the melody stuck for a long time, it can cause people to have intrusive
thoughts that associate with anxiety and depression.
Voice recognition had helped most of the population ease their daily routine and
keeping the work on tracks. Therefore, it is reasonable to be taken as an opportunity
to develop this system and help people to find songs by only using voices.
Page 2 of 15
F/BERC 1 (2021)
1) Having a rough time to remember the lyrics or the song title of an artist
Sometimes, for certain user it can be hassle to recall back what lyrics that was
played into radio or random tracks from our phone. This can be a case where
people suffer from short term memory lost (Zimmermann, 2017). Although it is
quite common for people to forget lyrics of a song, that perhaps played so
randomly, or a new released song that’s popped up from car’s radio.
There may be sometimes a melody can get stuck into our head for a long period
of time. This phenomenon can be called as involuntary musical imagery (INMI),
more widely known as “earworms”. According to a study from Durham University
lead by Jakubowski K., earworms are an extremely common phenomenon and
an example of spontaneous cognition. It stated that 40 percent of our days is
spending on thinking about random things and starting to try to understand about
the brains mechanism which is the thoughts that was unrelated with current
tasks.
Searching for a song can be impossible for some people. With the only melody
that stuck in their head as a clue, user tend to search by certain words of cryptic
lyric, the genre or perhaps nationality. With using this system, user can get a
result that is precise by 90 percent without having to search and entering
clueless query into the internet
References:
Rujukan:
Only include references cited in this document. Do not paste all your references from your
main proposal
Hanya sertakan rujukan yang dipetik dalam dokumen ini. Jangan tampalkan semua rujukan
dari kertas cadangan utama anda
R Jones, N. (2015, January 14). Nielson Study Reveals Rock Prevails As Most
Popular Genre In The US. Hypebot. Retrieved May 1, 2022, from
https://www.hypebot.com/hypebot/2015/01/nielson-study-reveals-rock-prevails-as-
most-popular-genre-in-the-us.html
Page 3 of 15
F/BERC 1 (2021)
speakerrecognition#:%7E:text=Voice%20or%20speaker%20recognition%20is,App
le’s%20Siri%20and%20Microsoft’s%20Cortana.
Brown, H. (2015, November 1). How Do You Solve a Problem Like an Earworm?
Scientific American. Retrieved May 1, 2022, from
https://www.scientificamerican.com/article/how-do-you-solve-a-problem-like-an-
earworm/?error=cookies_not_supported&code=9fbb58cf-7a39-4ddd-b1ce-
a6bad73d2953
Welch, A. (2016, November 3). Psychologists identify why certain songs get stuck
in your head. CBS News. Retrieved May 1, 2022, from
https://www.cbsnews.com/news/psychologists-identify-why-certain-songs-get-
stuck-in-your-head/
1. To develop song finder application that use voice recognition which is humming
or singing.
2. To evaluate the accuracy and the effectiveness of the system into helping
people find a song.
The system may solve the earworms problem, following with some people that have
melody stuck in their head. This system also help user to find song easier in the
future, with only humming into the system.
Page 4 of 15
F/BERC 1 (2021)
February 2023
6. Location of research:
Lokasi penyelidikan dijalankan:
i) Planning
This phase required for information gathering that was taken from online
survey, case study from journal, articles and other sources from literature
review. All the hardware and software requirements discuss here.
ii) Designing
From the requirement analysis, the details are used to design the system.
Flowchart and module are design to organize the system flow. Mock
interface also designed to picture the system interface.
iii) Coding
iv) Testing
Here the Unit Testing is run, to review whether any possible error is found,
thus will be fixed back in Coding Phase.
v) Listening
This phase required for Acceptance Test, after the system is complete. User
will give their feedback from using the system.
Page 5 of 15
F/BERC 1 (2021)
Quantitative analysis
Data Collection Methods: Questionnaires
Respondent: Students of UiTM Melaka
Data analysis: Google Form
8. Inclusion and exclusion criteria:
Kriteria kemasukan dan pengecualian:
Inclusion criteria:
Kriteria kemasukan:
● Student 18 years old and above
● Have access to a mobile phone
Exclusion criteria:
Kriteria pengecualian:
● Students under 18 years old
● Lack access to a mobile phone
9. Sample size:
Saiz sampel:
State the sampling method and minimum sample size calculated.
Nyatakan kaedah persampelan dan pengiraan saiz sampel minima.
Page 6 of 15
F/BERC 1 (2021)
Calculation:
Pengiraan:
N = 40
S = 36
Page 7 of 15
F/BERC 1 (2021)
Phase Activities
Planning - Gathered all the necessary information
- Planned the application features including the
method to be used in application development.
Designing - Designed flowchart, entity relationship diagram
and mock interface
Coding - Develop mobile application user interface
- Develop application module and function
- Plug in the music information retrieval (MRI) API
into the application
- Fixing errors that found from Unit Testing
- Integrated all the function into a single application
Testing - Conduct Unit Testing to identify any errors
- Verify the accuracy of the humming from the
audio-to-digital conversion function with query by
humming dataset to find song
- Determine whether the MRI API is function
Listening - Record Acceptance Test from the user
11. Statistical analysis:
Analisa statistik:
Briefly describe the data analysis and statistical analysis sotfware that will be utilized.
Huraikan secara ringkas analisis data dan perisian analisis statistik yang akan digunakan
Descriptive statistic:
Page 8 of 15
F/BERC 1 (2021)
5. Others: -NA-
Lain-lain:
1. Student
Pelajar
Student ID:
No. Pelajar: 2021125721
Mobile phone: 011-69894818
Telefon bimbit:
Email:
2021125721@student.uitm.edu.my
Emel:
2. Supervisor
Penyelia
Staff ID/No.Staf/
311841
Mobile phone: Telefon 013-8218885
bimbit:
Email: albin1841@uitm.edu.my
Emel:
Signature:
Tandatangan:
Page 9 of 15
F/BERC 1 (2021)
3. Co-Researcher
Penyelidik Bersama
Name: -NA-
Nama:
Staff ID/Student ID: -NA-
No.Staf/No. Pelajar:
Affiliation:
Jabatan: -NA-
Mobile phone: Telefon -NA-
bimbit:
Email: -NA-
Emel:
Signature: -NA- Date: -NA-
Tandatangan: Tarikh:
Page 10 of 15
F/BERC 1 (2021)
Page 12 of 15
F/BERC 1 (2021)
Page 13 of 15
F/BERC 1 (2021)
1. Please ensure that the named research team members have signed the
application.
2. Please ensure that the application has been signed and endorsed by the
Department or Postgraduate Research Committee.
3. All required documents must be submitted at least two (2) months before the
data collection.
4. Any data collection instruments that require completion by
respondents/participants shall be prepared in the Malay and English languages, and
other language(s) understood by the participants.
5. Please ensure that you have obtained the necessary permission or paid the
stipulated fee for use of survey questionnaires and/or statistical analysis software, if
and when necessary.
ITEM YES NO
PERKARA YA TIDAK
1 Have you presented your proposal at the Department or
Postgraduate Research Committee?
Adakah anda telah membentangkan proposal anda di
Jawatankuasa Penyelidikan Jabatan atau Pascasiswazah? /
2 Have you completed the F/BERC 1 form?
Adakah anda telah melengkapkan Borang F/BERC 1? /
3 Have you completed the F/BERC 2 and/or F/BERC 3 form?
Adakah anda telah melengkapkan Borang F/BERC 2 atau dan /
borang F/BERC 3?
4 Has your supervisor checked your application form?
Adakah penyelia anda telah menyemak borang permohonan /
anda?
5 Has the form been signed by all researchers?
Adakah borang ditandatangani oleh semua penyelidik? /
11/11/2022
11/11/2022
Page 14 of 15
F/BERC 1 (2021)
Comment if any:
Ulasan jika ada:
Page 15 of 15
F/BERC 2 (2021)
FERC/BERC 2
Research Title
Introduction of Research
This research focuses on the development of a mobile application that can be used to search
a song using human voice, which are humming, to help user find melody that may stuck in
their head.
Purpose of Research
1. To develop song finder application that use voice recognition which is humming or singing.
2. To evaluate the accuracy and the effectiveness of the system into helping people find a
song.
Research Procedure
Upon the completion of this application, individuals will be requested to provide feedback
through the use of an evaluation.
Participation in Research
Your participation in this research is entirely voluntary. You may refuse to take part in the
study or you may withdraw yourself from participation in the research at any time without
penalty.
Benefit of Research
Information obtained from this research will benefit the individuals, researchers, institution and
community for the advancement of knowledge and future practice.
Research Risk
The research poses no risk to the participants and the participants are free to withdraw from
the experiment.
Confidentiality
Your information will be kept confidential by the investigators and will not be made public
unless disclosure is required by law. By signing this consent form**, you will authorize the
review of records, analysis and use of the data arising from this research.
If you have any question about this research or your rights, please contact Muhammad Shafiq
Haqime bin Mohd Isa at 011-69894818
F/BERC 2 (2021)
**If you are using an online survey form (obtaining signature of participants are not feasible), please
include these statements at the beginning of the survey document:
Consent Form1
To become a participant in the research, you or your legal guardian are required to sign this
Consent Form.
I herewith confirm that I have met the requirement of age and am capable of acting on behalf
of myself / as2 a legal guardian as follows:
______________________________________________________________________
Name of Participant/Legally authorized representative (LAR) Signature
______________________________________________________________________
I.C No Date
______________________________________________________________________
Name of Witness3 Signature
______________________________________________________________________
I.C No Date
______________________________________________________________________
Name of Consent Taker Signature
______________________________________________________________________
I.C No Date
1
Original signed copy is to be retained by the Principal Investigator.
2
Delete whichever is not applicable.
3
A witness is only required for oral consent.
REC 4/ 2020/BM Pind. 2 (2020)
FERC/BERC 2
Tajuk penyelidikan
Pengenalan penyelidikan
Penyelidikan ini memberi tumpuan kepada pembangunan aplikasi mudah alih yang boleh
digunakan untuk mencari lagu menggunakan suara manusia, yang bersenandung, untuk
membantu pengguna mencari melodi yang mungkin melekat di kepala mereka.
Tujuan penyelidikan
Prosedur penyelidikan
Selepas permohonan ini selesai, individu akan diminta untuk memberikan maklum balas
melalui penggunaan penilaian.
Penyertaan anda di dalam penyelidikan ini adalah secara sukarela. Anda berhak menolak
tawaran penyertaan ini atau menarik diri daripada penyelidikan ini pada bila-bila masa tanpa
sebarang penalti.
Manfaat penyelidikan
Maklumat yang didapati dari penyelidikan ini akan memanfaatkan individu, penyelidik, institusi
dan komuniti dalam kemajuan pengetahuan dan amalan pada masa hadapan.
Risiko penyelidikan
Penyelidikan tidak menimbulkan risiko kepada peserta dan peserta bebas untuk menarik diri
daripada eksperimen.
Kerahsiaan
Maklumat anda akan dirahsiakan oleh penyelidik dan tidak akan didedahkan melainkan jika
ia dikehendaki oleh undang-undang. Dengan menandatangani borang persetujuan** ini, anda
membenarkan penelitian rekod, penganalisaan dan penggunaan data hasil daripada
penyelidikan ini.
REC 4/ 2020/BM Pind. 2 (2020)
Sekiranya anda mempunyai sebarang pertanyaan mengenai penyelidikan ini atau hak-hak
anda, sila hubungi Muhammad Shafiq Haqime bin Mohd Isa di talian 011-69894818
**Sekiranya anda menggunakan borang tinjauan atas talian (tandatangan peserta tidak dapat
diperoleh), sila sertakan pernyataan ini sebelum soalan tinjauan:
Untuk menyertai penyelidikan ini, anda atau penjaga sah perlu menandatangani Borang Izin
ini.
Saya dengan ini mengesahkan bahawa saya telah memenuhi syarat umur dan berupaya
bertindak bagi pihak saya sendiri/ sebagai2 penjaga yang sah dalam perkara-perkara berikut:
______________________________________________________________________
Nama Peserta/ Wakil Sah yang berkuatkuasa Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
______________________________________________________________________
Nama Saksi3 Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
______________________________________________________________________
Nama Penyelidik/Pengambil Izin Tandatangan
______________________________________________________________________
No. Kad Pengenalan Tarikh
REC 4/ 2020/BM Pind. 2 (2020)
1
Salinan asal disimpan oleh Penyelidik Utama dan satu salinan diserahkan kepada peserta.
2
Potong mana yang tidak berkenaan.
3
Saksi dimestikan memberi izin secara lisan.
ENGLISH VERSION
DEVELOPMENT OF MUSIC IDENTIFICATION BY USING HUMAN
VOICE RECOGNITION
MARCH 2022
Universiti Teknologi MARA
March 2022
SUPERVISOR APPROVAL
By
2021125721
The thesis was prepared under the supervision of the project supervisor, Mr. Ts. Albin
Lemuel Kushan. It was submitted to the Faculty of Computer and Mathematical
Sciences and was accepted in partial fulfilment of the requirements for the degree of
Bachelor of Computer Science (Hons.) Netcentric Computing.
Approved by
…………………………….
15 JULY 2022
i
STUDENT DECLARATION
I certify that this thesis and the project to which it refers is the product of my own work
and that any idea or quotation from the work of other people, published or otherwise
are fully acknowledged in accordance with the standard referring practices of the
discipline.
…………………….
15 JULY, 2022
ii
ACKNOWLEDGEMENT
Alhamdulillah, praises and thanks to Allah because of His Almighty and His utmost
blessings, I was able to finish this final year project report within the time duration
given. Firstly, my special thanks goes to my supervisor for giving me the opportunity
to embark on this project. Thank you for the constant supports and the time spent for
constructive comments and long discussion that lead to the complete of this work.
Special appreciation also goes to my beloved parents. I would like to take this
opportunity to express my gratitude and indebtedness to them for their unconditional
love and support throughout this whole process.
Last but not least, I would like to give my gratitude to my dearest friend for helping me
in searching for references and give a moral support during the development of this
project
iii
TABLE OF CONTENTS
CONTENTS
SUPERVISOR APPROVAL i
STUDENT DECLARATION ii
ACKNOWLEDGEMENT iii
LIST OF FIGURES vii
LIST OF TABLES viii
LIST OF ABBREVIATION ix
CHAPTER 1 ix
INTRODUCTION 1
1.1 Project Background 1
1.2 Problem Statement 3
1.2.1 Having a rough time to remember the lyrics or the song title
of an artist 3
1.2.2 Song or melody that stuck into their head 3
1.2.3 Always get the non-similar result when searching for a song 4
1.3 Project Aim 4
1.4 Project Objective 4
1.5 Project Scope 5
1.6 Project Significance 5
1.6.1 The system may solve the earworms problem 5
1.6.2 Easier to find song in the future 5
1.6.3 Give benefit to the music industry 5
1.7 Summary 6
CHAPTER 2 7
LITERATURE REVIEW 7
2.1 Music Services 7
2.1.1 Origin of Music Services 7
2.1.2 Rising of Music Services in Internet 8
2.1.3 Music Website VS Music Application 9
iv
2.2 Mobile Application 10
2.2.1 Web Mobile Application Development 10
2.2.2 Native Mobile Application Development 12
2.2.3 Hybrid Mobile Application Development 12
2.2.4 Comparison of Web, Native, and Hybrid Mobile Application
13
2.3 Human Voice Recognition 13
2.3.1 Definition of Human Voice Recognition 14
2.3.2 Analog-to-Digital Conversion 16
2.3.3 Query by Humming 22
2.4 Related Works 24
2.4.1 Shazam 24
2.4.2 SoundHound 25
2.4.3 Musixmatch 25
2.4.4 Discussion of related works 26
CHAPTER 3 27
PROJECT METHODOLOGY 27
3.1 Project Methodology 27
3.2 Project Framework 29
3.3 Planning 31
3.3.1 Hardware requirements 31
3.3.2 Software requirements 32
3.3.3 Survey analysis 32
3.3.4 Use Case Diagram 35
3.4 Designing 36
3.4.1 Flowchart 36
3.4.2 Entity Relationship Diagram 38
3.4.3 Interface Design 39
3.4.4 System Architecture 42
3.5 Coding 43
3.5.1 Recording – capturing the sound 44
v
3.5.2 Time-Domain and Frequency Domain 45
3.5.3 The Discrete Fourier Transform 46
3.5.4 Music Recognition: Fingerprinting a Song 48
3.5.5 The Music Algorithm: Song Identification 51
3.6 Testing 52
3.6.1 Unit Testing 52
3.6.2 Acceptance Testing 52
3.7 Listening 53
3.8 Summary 53
References 54
vi
LIST OF FIGURES
vii
LIST OF TABLES
viii
LIST OF ABBREVIATION
ix
VCD Video Compact Disc
W3C World Wide Web Consortium
WT Wavelet Transform
WWW World Wide Web
XP Extreme Programming
Y2K Year 2000 Problem
x
CHAPTER 1
1 INTRODUCTION
This chapter will concentrate on the features of the project. It provided the details
of Stario application for young people with using Voice Recognition system, and Query
by Humming (QbH) system. This chapter also covers the project background, problem
statement, aim, objectives, and significance that led to this application development.
Music is one of the arts that can be produce by arranging sounds, with the applied
element which are melody, harmony, rhythm and timbre (Wikipedia). It is a way to
express feelings and an application to expand people’s creativity since its discovery in
Paleolithic Age. From the old century musician like Beethoven, Bach and Mozart until
modern age like Frank Sinatra, Michael Jackson and Jimi Hendrix had gave user the
pleasure and also broad our mind into creativeness. With many wise people and
geniuses born into the world, music has divided to many branches to what user called
nowadays as a genre. The most popular genre that had been listened to these days are
rock, pop, RnB, and hip-hop (Nielson Music). These genres in particular were also been
divided into many types, and still yet more to be track down. As there are many songs
right now being produce, there are also people that have a problem to recognize music
that has been stuck into their head. This problem is called earworms, which are formally
benign form of rumination, the monotonous, intrusive thoughts associated with anxiety
and depression. According to Brown H., there are 92 percent of population that
suffering from the disease. Therefore, the main purpose of developing Stario mobile
application was to reduce this type of disease and help people find their desire song with
only use their voice, specifically humming and singing.
Since 2008, many of the people were fascinated with the technology developed by
Marvel’s superhero, Ironman or Tony Stark, which is J.A.R.V.I.S. The technology
developed as a computer interface for instance, later upgraded to enhance artificial
intelligence (A.I.) system. As people that lived in 21st century era, user saw A.I as an
improvement to some system that has been developed throughout years and decade. For
1
example, J.A.R.V.I.S. can regularly communicate and act as a personal assistant with
Tony Stark, including business and global security.
Query by Humming (QbH) is a music retrieval system that branches off the original
classification systems of title, artist, composer and genre. It normally applies to songs
or other music with a distinct single theme or melody. Formally user will hum a piece
of tune into microphone connected to the mobile phone or computer. The system then
will search a database of tune to find list of melodies that are similar to the user’s
“query”. Next, the result will be produced and user may need to check whether the song
is correct based on the hum inputted. Although, there will be a case where the song was
not the user meant to search, likely the hum was off tune and the database does not
contain that tune, or the system is not intelligent enough to tell whether two tunes sound
similar. QbH is a particular case of “Query by Content” in multimedia databases. Most
research into “query by humming” in the multimedia research community uses the
notion of “Contour” information. Melodic contour is the sequence of relative
differences in pitch between successive notes. It has been shown to be a method that the
2
listeners use to determine similarities between melodies. However, the inherent
difficulty the contour method encounters is that there is no known algorithm that can
reliably transcribe the user’s humming to discrete notes.
Based on observation, research and people’s survey. There are three problems
that can be identified according with the songs that can be hard to find. Below are the
problems that they had to face.
1.2.1 Having a rough time to remember the lyrics or the song title of
an artist
Sometimes, for certain people it can be hassle to recall back what lyrics that was
played into radio or random tracks from our phone. This can be a case where people
suffer from short term memory lost (Zimmermann, 2017). Although it is quite common
for people to forget lyrics of a song, that perhaps played so randomly, or a new released
song that’s popped up from car’s radio. Furthermore, most people can only remember
the melody and harmony of the song. The melody can be utterly catchy regarding the
repetitive instrumentals from the song itself. Recognizing the artist also makes it very
difficult for people to know what was the song that played in few minutes ago. The song
would be singing by a popular artist that was on hiatus in a long time, later making
comeback and release a new track. When people want to type in the lyrics of the song
to search for it, the result may be not the expected one because of certain wrong words,
and may took a very long time to find.
3
with pop music, following with top chart song such as “Bad Romance” by Lady Gaga,
“Don’t Stop Believing” by Journey, “Can’t Get You Out of My Head” by Kylie
Minogue, study shows. Even though earworms are not considered dangerous, it may
cause stress or obsession to a mental health issues patient, and in rare cases are
experienced during migraine and epileptic attacks.
1.2.3 Always get the non-similar result when searching for a song
Searching for a song can be impossible for some people. With the only melody
that stuck in their head as a clue, user tend to search by certain words of cryptic lyric,
the genre or perhaps nationality. With using Stario, people can get a result that is precise
by 90 percent without having to search and entering clueless query into the internet.
4
1.5 Project Scope
The target user of this mobile application is consisting of three people that have
different voices. From the survey conducted, 74% of people are listening to pop music,
while 64.5% of them are listening to rock music. Therefore, this mobile application will
focus more on retrieving pop and rock music from the humming. There will be 10 songs
of the different genre, mainly pop and rock to be test during the evaluation. To use the
application, the three people will hum or sing into the microphone of a device, which
already been install with the apps. The hum or sing act as an input. Then the input will
be search within the dataset to get a song that sound similar. The result will be produced
with 10 seconds of playback, the name of the song, artist and genre.
5
1.7 Summary
This chapter provided all the information and define the objectives, problem
statement, scope and limitation, and significance of the project. The objective of
creating the application is to design Stario mobile application: using analog to digital
conversion method to change humming into digital frequency and developed with songs
matching using query by humming method to match and retrieve the song from the
user’s input. The other objective is to evaluate the accuracy of analog to digital
conversion and query by humming method in songs matching and the effectiveness of
the system into helping people find a song.
The problem statement that is extracted from the public survey which consist of
30 people and research from several materials, journal and study which to be happened
as difficulties of finding desire songs. The scope of this project that consist of three
different people that may spend 10 seconds to hum or sing for the song.
6
CHAPTER 2
2 LITERATURE REVIEW
Chapter two contains the literature review on article and information collected
using various method that is used for the developed Stario mobile application. Data and
information in the chapter two are gathered from books, journal, articles and other
sources. The information gathered were used in helping to elaborate and explain the
method and technique that would be implemented in the development. This chapter will
discuss the general information and overview of music services, mobile application,
human voice recognition and related works.
Since the beginning of human civilization, music has been one of the essential
for all human beings, be it from infant to elderly. The importance of music in our society
has led to creating an industry that includes all the concepts inherent to this thematic,
such as its organization, distribution, and profitability (Barata & Coelho, 2021). The
music industry has made up many records, hence monopolizing the production and
consumption of music.
Nowadays as user emerge into modern world, there are many music streaming platforms
been developed for public to listen music without having any causalities. Way back in
1999 when the Y2K bug had everyone gripped in a vice of fear, a peer-to-peer music
sharing website by the name of Napster started gaining traction amongst American
college students, who used the online service to share MP3 files of songs amongst one
another for free.
One of the most notable features of Napster was that it provided a platform for
music lovers to not only download albums for free, but also gain access to rare live
versions, alternate cuts, and demo version of their favorite artists. Regardless of their
popularity, many issues were reported towards the Napster. Only four months after the
website started, the Recording Industry Association of America filed a lawsuit against
Napster, and musical giants Metallica and Dr. Dre both filed lawsuits against the
7
website after unfinished versions of their tracks were leaked onto Napster in 2000
(Brewster, 2021).
Upon the demise of Napster, it begins to show that while peer-to-peer music
sharing was an extremely contentious practice within the music industry, online music
sharing was certainly a direction worth exploring. The new concepts of digital music
distribution have been established recently, e.g., Music as a Service (MaaS), in which
the content is not transferred and therefore differentiating itself from the well-known
download, thus promoting full-time access instead of physical property ownership.
There are many platforms of music services that can be found, either offline or
online. One of the offline music platforms is radio. Radio plays a significant role in
informing, educating and enlightening the everyday public life. It also performs
entertainment role through music, drama, talk shows, live sports and other soft angles
that appeal to such societies (Rahman Ullah & Khan, 2017). An individual can live
longer until the modern generation, and still switching on their radio. It can be from the
car, mobile phone, or the radio device itself. Another founding of offline music is
hardware player. Meier & Manzerolle (2019) stated that it can be published as a vinyl,
compact disc or cassette. Although, the hardware method seems to be increasingly
delinked amid the rise of virtual format. Music-related industries are increasingly
focused on offering provisional access to catalogues of recorded music via streaming
services rather than ownership of recordings (Meier & Manzerolle, 2019). While the
8
online platform has numerous amounts of streaming services available. User can listen
to network site such as Apple Music, Spotify or YouTube that become popular, and
provide with new features that are in urban settings, and available to everyone for free
(Komulainen et al., 2010).
The Internet has become an integral part of modern society and economies
around the world (Kusumawati et al., 2019). It can be a medium for business people to
communicate and sell their product. Thus, the music industry took this opportunity to
introduce and sell its music products both songs, video clips complete with the lyrics of
the song through Internet. The publishing of music production can be either on the
website or application. Throughout the years, the discussion of website and application
giving a huge impact onto the developer. The advantages and disadvantages of both
methods frequently talked about, and being compared. Table below shows the
comparison between music website and music application:
Downloadable No Yes
9
Based on the table, user can see the music application has greater advantages rather than
music web. Even so, there may be some number of flaws of the music application
compared with music web. Therefore, user can conclude that music application is the
best choice to be develop.
Web mobile application is based on smart phone or any devices that can access
web browser on it. Sometimes, a problem might occur when user try to access a mobile
application in the play store or apple store. This is where web mobile application can
10
act as a substitute for the mobile apps. Besides that, web mobile application can also be
launch on any platform as long there is an installed browser on the device since it is
independent.
11
2.2.2 Native Mobile Application Development
Native applications are built using the native language of the platform or device
it is intended to be used on (Mohammadi & Jahid, 2016). For instance, a mobile
application is developed for Android platform. Thus, other OS like IOS platform cannot
download the application. There are different SDKs for each platform and different
tools, APIs and devices with different functionalities on each platform (Pinto &
Coutinho, 2018). Native development is usually depended on how much the developer
want to link the application with the target 18 operating system. This is because, some
of the capabilities that exist in certain OS does not exist in others OS. Although native
apps only work for certain mobile operating system, there are still advantages for native
mobile application such as it performed faster on the device that the native apps work
with because of the device built in features just like apple product. Since the target of
this project development focus on android platform only, choosing Native mobile
application for development is the best option since it scales with the scope and the
application performance can be increase further. Further discussion for why Native
mobile application is chosen will be discussed in chapter 2.2.4 later.
Hybrid development combines the best of both the native and HTML5 worlds
(Mohammadi & Jahid, 2016). The development of hybrid includes of traditional web
developing programming languages such as HTML5, JavaScript and CSS. According
to Kaczmarczyk (2021), hybrid application can be run on many platforms, which
eliminates necessity of implementing more than one version of an application. This is
differed from native approach, that only support on one platform that was meant to be
developed. Moreover, hybrid application can access the features that built on the
platform, such as GPS, camera and location like native. This approach forces the rewrite
of all applications to match the different operating system (Pinto & Coutinho, 2018).
Although, hybrid sound more reliable compare to other two type of development, the
scales for hybrid mobile application is too big and does not fit with the scope of this
project where it focused only in a single platform development. Therefore, hybrid is not
chosen for the development of the project
12
2.2.4 Comparison of Web, Native, and Hybrid Mobile Application
Each mobile application type has its own advantage and disadvantage. For
developers, there are many aspects need to be taken into consideration before they
choose which type of development they wanted to do. Because of that, a clearer
understanding between all the mobile application development is needed. Table 2.2
shows the difference between three mobile applications in a table for easier comparison.
13
2.3.1 Definition of Human Voice Recognition
Voice recognition system is a system which is used to convert human voice into
signal, which can be understood by the machines (Hansen et al., 2017). The history of
speech recognition started in the 1940s. The “Audrey” system, designed by Bell
Laboratories was the first speech recognition system that could only understand digits.
As the time passed, the device was enhanced to recognize spoken word, to obtain ASR
(Automatic Speech Recognition).
The voice recognition system is divided into number of classes, which is:
1) Isolated Speech
Isolated words usually involve a pause between two utterances; it doesn’t mean that, it
only accepts a single word, but requires one utterance at a time.
2) Connected Speech
Similar to isolated speech, but allows separate utterances with minimal pauses between
them.
3) Continuous Speech
Allows the user to speak almost naturally, also called computer dictation.
4) Spontaneous Speech
Can be thought of as speech at a basic level, that is natural sounding and not rehearsed.
An ASR system with spontaneous speech ability have to be able to handle variety of
natural speech features such as words being run together, “ums” and “ahs”, even the
slightest stutters.
These number of classes are designed to work on their specific function, means some
of the system developed for voice recognition does not necessarily need to apply a
difficult and hard codec recognition system.
Chandolikar et al. (2022) stated that several of applications has been developed in order
to ease human activities, which are:
14
1) Audio Classification
Widely and well-known, and entails designating a voice to one of several different
classes, purposely to determine the sound’s kind or origin. The system might recognize
one sound to different possibilities, for example sound of car starting, dog’s barking or
siren.
Categorization of music based on the genre played with their songs, for example rock,
ballad, pop and hip-hop.
Capture some acoustic properties and appends them to make a music record with all of
the musical elements & tones in the song.
5) Voice Recognition
Spoken words from the user can determine the gender, race, identity or name. The
voices also can determine user’s emotions.
Involves not only acoustic analysis, but also NLP (Natural Language Processing). This
process required the development of understanding basic language skills in order to
15
distinguish separate words from voiced noises. Applications like Siri’s Apple and
Alexa’s Amazon are a few that achieved the good functionality of the system.
16
Figure 2.3: A continuous signal (analog) turning into a digital signal
The ADC’s sampling rate, also known as sampling frequency, can be tied to the
ADC’s speed. The sampling rate is measured by using “samples per second”, where the
units are in SPS or S/s (or if you’re using sampling frequency, it would be in Hz). This
simply means how many samples or data points it takes within a second. The more
samples the ADC takes, the higher frequencies it can handle. The important equation
on the sample rate is:
fs = 1/T
Where,
fs = Sample Rate/Frequency
17
For example, in Figure 2.3, T appears to be 50 ms, while fs appears to be 20 S / s (or 20
Hz). The sampling rate is very slow, but the signal was output just like the original
analog signal. This is because the frequency of the original signal is as slow as 1 Hz.
That is, the frequency rate was sufficient to reconstruct a similar signal.
There will be few cases where the sampling rate will be considerably slower.
Knowing the ADC sample rate is important because user need to know if it causes
aliasing. Aliasing means that when a digital image / signal is reconstructed, it differs
significantly from the original image / signal generated by sampling. If the sample rate
is slow and the frequency of the signal is high, the ADC will not be able to reconstruct
the original analog signal, causing the system to read incorrect data. A good example is
shown in Figure 2.4.
In this example, we can see where the sampling takes place on the analog input
signal. The digital signal output never approaches the original signal because the
sampling rate is not high enough to catch up with the analog signal. This causes aliasing,
and digital systems lack the full picture of analog signals.
A rule of thumb for determining if aliasing is occurring is to use the Nyquist theorem.
According to the theorem, the sample rate / frequency must be at least twice the
18
maximum frequency of the signal to restore the original analog signal. The following
equation is used to find the Nyquist frequency:
fNyquist = 2fMax
Where,
For example, if the maximum frequency of the signal input to the digital system
is 100 kHz, the ADC sample rate should be at least 200 kS / s. This allows the original
signal to be successfully reconstructed.
Also note that the signal may be interrupted because external noise introduces
an unexpectedly high frequency into the analog signal and the sample rate cannot handle
the added noise frequency. It is recommended to add an anti-aliasing filter (low pass
filter) before starting ADC and sampling. This will prevent unexpected high frequencies
from entering the system.
The resolution of the ADC can be related to the accuracy of the ADC. The
resolution of the ADC is determined by its bit length. Figure 2.4 shows a simple
example of how a digital signal can help user output a more accurate signal. Here we
can see that there are only two "levels" per bit. Increasing the bit length raises the level
and makes the signal more faithful to the original analog signal.
19
Figure 2.5: Example on how resolution affects the digital signal
Where,
20
N = 2n
Where,
n = Bit Size
For example, suppose a need to read a sine wave with a voltage range of 5. The
bit size of the ADC is 12 bits. If you connect 12 to n in Equation 4, N becomes 4096.
With this in mind, if we set the voltage reference to 5V, then the step size = 5V / 4096.
we can see that the step size is about 0.00122V (or 1.22mV). This is accurate because
the digital system can detect the voltage change with an accuracy of 1.22 mV. If the
ADC has a very short bit length (for example, only 2 bits), the accuracy drops
to 1.25V. This gives very bad results as it only gives the system four voltage levels (0V,
1.25V, 2.5V, 3.75V). And 5V). Figure 2.6 shows common bit length and their number
of levels. It also shows what the step size would be for a 5V reference. We can see how
accurate it gets as the bit length increases.
Figure 2.6: Bit length and their number of levels and step size for a 5V reference range
21
2.3.3 Query by Humming
Also, different types of algorithms are used to detect the sound, which is divided into:
22
Musical signals often exhibit richness and indeterminacy and complex dynamic
structure. The mathematical transform such as Fourier Transforms (FT) and Wavelet
Transforms (WT) (Nagavi & Bhajantri, 2018) can captures the structural information.
However, they are not adept for capturing music signal’s non-stationarity and
dynamisms. Another transform, Hilbert-Huang Transforms (HHT) in the form of
Empirical Mode Decomposition (EMD), are able at dimension reduction and
quantifying the complex structures and dynamism in music signals. Figure 2.7 shows
the illustration of proposed QbH system.
23
2.4 Related Works
2.4.1 Shazam
Released in 2002, Shazam has been a biggest music recognition service that is
ever developed. Before released as a legitimate system, user has to calls up Shazam
service center using their mobile phone, with 15 seconds of an audio obtained from
played music. The identification later made on the sample at Shazam server, then the
track title and artist will be sent back via SMS text messaging (Hussain et al., 2017).
User can register and log in with a mobile phone number and password on the website
to retrieve the information. On a desktop or smartphone, they may view their tagged
track list and buy the CD. The tagged track is downloadable as a ringtone if is available.
The 30-seconds clip of the tagged song also can be sent to their friends.
The method Shazam use to recognize the exact track is with the preprocessing
that creates fingerprints (Xiao, 2019). From a spectrogram of a signal, it will be use to
determine relative peaks, and plots the peaks as a cleaner version of the spectrogram. A
spectrogram is a plot of frequency over time, and the designation of “relative peaks” is
not specified. Figure 2.8 shows the illustration of the process for music recognition by
Shazam.
Next, Shazam chooses a large set of anchor points throughout the song, and
creates a pair with each anchor point and a set of neighboring points. The neighboring
points are all of the points within an area following the anchor point, withing some
range of time and within some range of notes above and below the anchor point. All of
the pairs for a song are then each made into a hash, or fingerprint, and saved in the
database along with their absolute time of occurrence for later use. The average number
of pairs per anchor point is referred to as the fan-out factor.
24
During searching process, the same technique is used to generate hashes for the
query, Then, for each song containing a matching hash, the matching hashes are
graphed: query time on the vertical axis, song time on the horizontal axis, and a dot
plotted on each hash. If a diagonal line of dots appears, meaning that the hashes of the
query coincide with the song sequentially during that time period, the song is considered
a result. For a non-result, only a scattering of matching hashes will occur.
The system of Shazam only works for a real tracks and audio. As long as the
audio is detected from an artist or producer, the desired result will be retrieved.
Shazam’s goal is not the same as the Stario, which is to get the people’s humming and
turn it into the searched song details.
2.4.2 SoundHound
2.4.3 Musixmatch
Musixmatch is an application that merge songs and lyrics. It gives listeners the
ability to read lyrics from the screen of the listener's device, Android or PC. If a song is
playing and the lyrics are on, an image of the band or song will appear with the lyrics.
Musixmatch is a global lyrics provider that provides its service in 20 different
languages. Moreover, he has developed a top-of-the-line application for mobile phones,
desktops or tablets and he has created a powerful application programming interface
(API) that can be used with any website or application. It is the world's largest official
lyrics catalog that allows developers and music fans everywhere to tap into the power
of online lyrics quickly and easily. It allows anyone to easily plug in and distribute
authorized lyrics. It is fully compatible with many music apps for Android like Winamp,
Google Music, WIMP, iTunes, Archos Music Player or Spotify. We can listen to music
through these applications and even continue to play the lyrics from the application.
25
2.4.4 Discussion of related works
26
CHAPTER 3
3 PROJECT METHODOLOGY
This chapter explain about the project methodology which provide easy to
understand and a better view of how the project have been developed. The activity of
each phase of the chosen methodology were provided as a guidance in planning and
constructing the overall project.
There are many of SDLC model that can be used as a guide to develop a software
or system. For this project, the project methodology that has been implemented is based
on one of the agile methods, which is Extreme Programming (XP) methodology. XP
methodology is a software development methodology that’s part of what is collectively
known as agile methodologies.
27
XP is built upon values, principles, and its goal is to allow small to mid-sized
teams to produce high-quality software and adapt to evolving and changing
requirements. It is a lightweight methodology that has gained increasing acceptance
and popularity in the software community. XP promotes a discipline of software
development based on principles of simplicity, communication, feedback, and courage
(Kircher et al., 2001).
This way of works suited for this project development since the development of
system require repeat testing to get the absolute result. For example, if the humming
testing is not accurate with the dataset from QbH, coding and maintenance needs to be
supervised well, regarding with the frequency and the analog-to-digital conversion
method. Figure 3.2 shows the XP illustration on how the works done.
ExtremeProgramming
Figure 3.2: Extreme ProgrammingPhase
Phase
28
3.2 Project Framework
This project framework is based on XP model, and the five phases consist of
planning, designing, coding, testing and listening. Figure 3.3 shows the framework for
this project and Figure 3.4 shows the XP lifecycles of summary on details regarding the
phases of Stario mobile application based on the XP model.
29
Figure 3.4: XP Phases for Stario development
30
3.3 Planning
Taking a first step into the project development, planning is essentially required
to meet the goals. All the information taken from the online survey will be made into
user’s stories, and then turn it into iteration. The project is divided into iterations, and
iteration planning initiates each iteration. The next step is the gathering of information
which was used to complete this project development which is gathered and collected
through journal, articles and other sources which result with literature review. In the
literature review, all information including technique and software that was used in the
development is studied in order to fully understand the functional specification of each
module that will be exist in Stario Mobile Application. During this step also, the related
work and existing application is used as a benchmark to identify the use case diagram
for this project
Hardware requirement is the component that was used for the system
development in term of physical aspect. In this project, the hardware components that
are used is the personal computer. Table 3.1 shows the hardware requirement for this
project.
No Hardware Description
1 Lenovo Ideapad 320 OS: Windows 10 Pro (64 bit)
Processor: AMD A12-9720P RADEON R7, 12
COMPUTE CORES 4C+8G 2.70 GH
RAM: 8 Gb
31
3.3.2 Software requirements
To complete the project development, a software is required to ensure the
hardware component can perform and functioning with efficiency way during the
development. Table 3.2 shows the software requirements.
No Software Description
1 Flutter Flutter is used to create the framework and
interface for the application to run
2 Java Development Kit This is the kit that was used to develop the java
application and applets during the development
3 Draw.io Draw.io is an open software to create Use Case
Diagram, Flowchart and Entity Relationship
Diagram
4 NoSQL A database that use to store user and song details
32
Figure 3.5 shows that 68.8% of public respondents have problem with melody
stuck in their head while 31.3% of the rest is not commonly having the problem. The
problem needs to be identified to make sure the percentage is reduced at least below
50%.
Figure 3.6 shows that 51.6% of the public respondents is having melody stuck
in their head for at least 5-10 seconds. The second highest is at 25.8% which is less than
5 seconds, and last one is 22.6% which is more than 10 seconds. With the percentage
we can deduct that the scope of the project must be at least 10 seconds of humming
when user want search the song.
33
Figure 3.7 shows that 78.1% of the public respondents having a difficulty to
search the melody in the internet, and 21.9% of the rest may assume to have the glimpse
of the song title. This difficulty may refer to not remembering the lyrics of the song.
Figure 3.8 shows that 75% of the public respondents prefer to sing/hum to search
for song rather than 25% that want to search for the lyrics. This percentage will act as a
support for the development of Stario.
34
3.3.4 Use Case Diagram
Figure
Figure 3.9: 3.9:
Use Use
CaseCase Diagram
Diagram for Stario's
for Stario user U
The use case is analyzed to get the view interaction between the user and the
application. Based on the Figure 3.9, the user gets to sign up or login if they have the
existing account. They also have the ability to view and edit their profile on the
application. The most important thing in the application is to search the song by
humming or singing. Lastly user can look for their history search song to see the
previous search song.
Actors Description
User - User who has already registered the Stario
application
- User can be general, or simply anyone that have
problem to search the song
35
3.4 Designing
Application design phase is the next step after requirements analysis where more
details about the system and how the system flow works is illustrated. This phase is
important to ensure better understanding and view about Stario was developed. In this
phase, the Flowchart Diagram, Entity Relationship Diagram (ERD) and the graphical
interface is illustrated and will be explained.
3.4.1 Flowchart
Flowchart is use to show how the system and process work by showing the flow
of the process from beginning until the system ended by the users. There are 2 modules
for the user which is Register Module and Search Song Module.
36
Figure 3.10 shows the flowchart of user when they open the application. If the
user already had an account into the application, he/she does not need to register for the
new account and proceed with sign in. Meanwhile, the registration is needed for the
new user. After sign in user may enter the homepage.
Figure 3.11 shows the flowchart of user when they want to search for their song.
User required to hum or sing into their microphone on the mobile phone. The system
then will change the audio into the digital signal, then match-making it with the dataset.
The song (output) will play for 10 seconds. If the song is incorrect user may want to try
hum/sing again.
37
3.4.2 Entity Relationship Diagram
Next is designing the Entity Relationship Diagram where all the entity of the
application system showed each relationship between them in the database. In ERD, the
relationship between entity consists of one to one, one to many, many to one and many
to many depend 78 on the condition and relation between the entity in the ERD. In
Figure 3.12, there are three tables which are User, Song and Song_History.
The User table have the userID as primary key which have one-to-many
relationship to Song and Song_History table. In this table user have the ability to view,
edit and delete their profile.
The Song table stores the details of the song, which is the title, artist, album and
their genre. The User will get their desire output from this table after they hum to search.
The Song_History table act as a bridge between Song table and User table, as
different user will have bunch of song to search. The Song_History will store the foreign
key from User and Song table’s primary key to be able store it as a store for recent
search song.
38
3.4.3 Interface Design
Interface Description
This is the first interface when user
installed Stario on their mobile phone.
User need to enter username and
password to use the Stario, or sign up if
there is no existing account.
39
This is the homepage for the Stario. Here
user will tap the middle button to
hum/sing for searching the song.
40
The search history will store the
previous result of user searched song.
User may want to clear the history if
necessary.
41
3.4.4 System Architecture
In Figure 3.13, the figure shows the initial input is taken as a recording. The
input then will be captured and sample to be convert into digital signal. Later on, the
fingerprinting is conducted for match-making the sound with the song in the database.
The ISP will act as a intermediary that give internet connection in order to store the data
of the song and the user details.
42
3.5 Coding
For this project, we will focus on the coding which will apply ADC to sample the
audio from user’s hum/sing.
Recording devices mimic this process quite closely, using the pressure of the
sound wave to convert it into an electrical signal. The real sound wave in the air is a
continuous pressure signal. In a microphone, the first electrical component that
encounters this signal converts it to an analog voltage signal - again, DC. This
continuous signal is not too useful in the digital world, so before it can be processed, it
must be translated into a discrete signal that can be stored in digital form. This is done
by acquiring a numerical value representing the amplitude of the signal.
The conversion involves quantizing the input and it is bound to introduce some
small errors. Therefore, instead of a single conversion, an analog-to-digital converter
performs many conversions on very small parts of the signal - a process known as
sampling.
43
sampling fee of 44,100 hundred Hz. This is the sampling fee of Compact Discs, and is
likewise the maximum usually used fee with MPEG-1 audio (VCD, SVCD, MP3). (This
unique fee become at the start selected with the aid of using Sony due to the fact it may
be recorded on changed video device going for walks at both 25 frames consistent with
second (PAL) or 30 frames consistent with second (the use of an NTSC monochrome
video recorder) and cowl the 20,000 Hz bandwidth concept essential to suit expert
analog recording device of the time.) So, while deciding on the frequency of the pattern
this is had to be recorded, you'll likely need to go along with 44,100 hundred Hz.
The user hum/sing will be captured with the following coding in Figure 3.14.
We will use Java programming language, set the frequency of the sample, number of
channels (mono/stereo), sample size (e.g., 16-bit sample). Then open the line from the
sound card and write to a byte array.
Figure 3.15: Sample coding for getFormat(), to change the audio into digital
The data will be read from TargetDataLine. In Figure 3.16, the running flag is
a global variable which is stop by another thread – for example, if user click the search
icon to stop recording.
44
Figure 3.16: Sample coding for song match-making
What we have in this byte array is signal recorded in the time domain. The time-
domain signal represents the amplitude change of the signal over time.
Given that each component sinusoid has a certain frequency, amplitude, and
phase, Jean-Baptiste Joseph Fourier made the extraordinary discovery in the early 1800s
that any signal in the time domain is equivalent to the sum of some (potentially infinite)
number of simple sinusoidal signals. The original time-domain signal's Fourier series is
the collection of sinusoids that make up the entire signal.
In other words, any time domain signal may be represented by simply providing
the frequencies, amplitudes, and phases that correspond to each of the sinusoids that
make up the signal. The frequency domain is the name given to this illustration of the
signal. A static representation of a dynamic signal is provided by the frequency domain,
which functions in certain respects as a form of fingerprint or signature for the time-
domain signal.
45
Figure 3.17: The illustration of time-domain and frequency-domain
Many things are much simplified when analyzing a signal in the frequency
domain. Because the engineer can examine the spectrum—a representation of the
signal in the frequency domain—and ascertain which frequencies are there and which
are absent, it is more practical in the world of digital signal processing. Then, using
the provided frequencies, one can filter, alter some of the frequencies, or just
determine the precise tone.
In order to move our signal from the time domain to the frequency domain, we
must develop a technique to do it. The Discrete Fourier Transform (DFT) is used in this
situation. A discrete (sampled) signal can be subjected to Fourier analysis using the
DFT mathematical methodology. By assuming that each sinusoid had been sampled at
the same rate, it transforms a finite list of evenly spaced function samples into a list of
coefficients for a finite combination of complex sinusoids, arranged by their
frequencies.
The Fast Fourier transform is one of the most widely used numerical methods
for DFT calculation (FFT). The Cooley-Tukey method is by far the most often used
FFT variant. This algorithm recursively splits a DFT into numerous smaller DFTs in a
divide-and-conquer strategy. When using a Cooley-Tukey FFT, the identical result can
46
be calculated in O(n log n) operations as opposed to directly evaluating a DFT, which
requires O(n2) operations. Figure 3.18 shows the FFT function.
47
Figure 3.19: The example of before and after FFT analysis
To transform just this portion of the data, we utilize some sort of sliding window
or data chunk. There are several methods for determining the size of each chunk. One
second of a sound, for instance, will be 44,100 samples * 2 bytes * 2 channels 176 kB
if it is recorded in stereo with 16-bit samples at 44,100 Hz. In every second of the music,
44 chunks of data will be available for analysis if we choose a chunk size of 4 kB. That
density is adequate for the thorough analysis required for audio identification.
48
Figure 3.20: Sample coding of fingerprinting
In the inner loop we are putting the time-domain data (the samples) into a
complex number with imaginary part 0. In the outer loop, we iterate through all the
chunks and perform FFT analysis on each.
We may begin creating our digital fingerprint of the song once we are aware of
the frequency composition of the signal. The Stario audio recognition system's most
crucial step is this one. The key difficulty here is figuring out which frequencies, out of
the sea of frequencies collected, are the most significant. We instinctively look for the
frequencies with the largest magnitude (commonly called peaks).
However, in a single song, the strong frequency range could run from low C -
C1 (32.70 Hz) to high C - C8 (4,186.01 Hz). This is a huge time frame to cover.
Therefore, rather than examining the entire frequency range at once, we can select a
number of smaller intervals and examine each one separately. These intervals can be
selected based on the common frequencies of significant musical elements. For
instance, we could make use of the intervals this guy used to construct the Shazam
algorithm. For the low tones (covering, for instance, bass guitar), they are 30 Hz to 40
Hz, 40 Hz to 80 Hz, and 80 Hz to 120 Hz; for the middle and high tones, these are 120
Hz to 180 Hz and 180 Hz to 300 Hz (covering vocals and most other instruments).
Now within each interval, we can simply identify the frequency with the highest
magnitude. This information forms a signature for this chunk of the song, and this
signature becomes part of the fingerprint of the song as a whole.
49
Figure 3.21: Sample coding of fingerprinting
Assuming that the recording was not done in ideal circumstances (i.e., in a "deaf
room"), we must add a fuzz factor to the equation. Fuzz factor analysis needs to be
treated seriously, and in a practical system, the program should offer the opportunity to
modify this parameter in accordance with the circumstances surrounding the recording.
This signature serves as a hash table's key so that audio searches can be done
with ease. Along with the song ID, the matching value indicates when this particular set
of frequencies first appeared in the song (song title and artist). An illustration of how
these records would look in the database is shown here.
50
Figure 3.23: Example of records found in database
To identify the melody that stuck in our head, we hum/sing into our phone, and
run the recording through the same audio fingerprinting process as above. Then we can
start searching the database for matching hash tags.
As it happens, many of the hash tags will correspond to the music identifier of
multiple songs. For example, it may be that some piece of song A sounds exactly like
some piece of song. Each time we match a hash tag, the number of possible matches
gets smaller, but it is likely that this information alone will not narrow the match down
to a single song. So, there is one more thing that we need to check with our music
recognition algorithm, and that is the timing.
We cannot only compare the timestamp of the matched hash with the date of our
sample since the sample we recorded by humming or singing could be from any point
in the song. To strengthen our conviction, we can examine the relative timing of the
matches when there are several matched hashes.
As you can see in the Figure 3.22 above, the hash tag 30 51 99 121 195 is
associated with both Song A and Song E. Another match for Song A might occur if,
one second later, we matched the hash 34 57 95 111 200, but in this instance, we are
certain that the hashes and time discrepancies matched.
51
Figure 3.24: Example of coding for music identification
3.6 Testing
Testing phase is the most important part for the development because this is where
the system or application will be tested whether it is functioning or not. The test that
was conducted is accuracy test. The aim for the test is to determine whether the project
can move to the final phase.
52
3.7 Listening
Listening is all about constant communication and feedback from the customers
(supervisor). The customers and the project developer are involved to describe the
business logic and value that is expected. This communication will determine the
outcome of this project.
3.8 Summary
In design phase, the graphic user interface for the application was designed
based on the method use which is sound recording. The design of the application use
Flatter software to complete the interface. After the design is done, the project
development starts and this phase is where all the method and technique were
implemented. The coding example will be implemented into the project with Java
programming language. Many discover algorithm will be use such as Discrete Fourier
Transform and Fast Fourier Transform. These algorithms are responsible to change the
sound recording into the digital signals for song match-making. It will be compared
with the existing database which is stored in NoSQL. While we code the program,
several unit testing are conducted to make sure the less errors from the algorithm. Then
the customer, which is the supervisor will do acceptance test to give the satisfaction of
the application functionality. The phase are consider complete when there is no
particular error, and can be release into the server. This clearly showed how the phase
in the framework is related and how important a framework is in developing a system.
53
References
R Jones, N. (2015, January 14). Nielson Study Reveals Rock Prevails As Most Popular
Genre In The US. Hypebot. Retrieved May 1, 2022, from
https://www.hypebot.com/hypebot/2015/01/nielson-study-reveals-rock-prevails-as-
most-popular-genre-in-the-us.html
Brown, H. (2015, November 1). How Do You Solve a Problem Like an Earworm?
Scientific American. Retrieved May 1, 2022, from
https://www.scientificamerican.com/article/how-do-you-solve-a-problem-like-an-
earworm/?error=cookies_not_supported&code=9fbb58cf-7a39-4ddd-b1ce-
a6bad73d2953
A. (2018, May 3). What is Query by Humming? - ACRCloud. Medium. Retrieved May
1, 2022, from https://medium.com/acrcloud/what-is-query-by-humming-
9fb8e74e738a
Zimmermann, K. A. (2017, April 24). What Is Short-Term Memory Loss?
Livescience.Com. Retrieved May 1, 2022, from https://www.livescience.com/42891-
short-term-memory-loss.html
Welch, A. (2016, November 3). Psychologists identify why certain songs get stuck in
your head. CBS News. Retrieved May 1, 2022, from
https://www.cbsnews.com/news/psychologists-identify-why-certain-songs-get-stuck-
in-your-head/
54
Barata, M. L., & Coelho, P. S. (2021). Music streaming services: understanding the
drivers of customer purchase and intention to recommend. Heliyon, 7(8).
https://doi.org/10.1016/j.heliyon.2021.e07783
Chandolikar, N., Joshi, C., Roy, P., Gawas, A., & Vishwakarma, M. (2022). Voice
Recognition: A Comprehensive Survey. 2022 International Mobile and
Embedded Technology Conference, MECON 2022, 45–51.
https://doi.org/10.1109/MECON53876.2022.9751903
Experiments With the Shazam Music Identification Algorithm. (2019).
Hansen, G. C., Falkenbach, K. H., & Yaghmai, I. (1988). Voice recognition system.
Radiology, 169(2), 580. https://doi.org/10.1148/radiology.169.2.3175016
Holzer, A., & Jan, O. (2009). Trends in mobile application development. Lecture
Notes of the Institute for Computer Sciences, Social-Informatics and
Telecommunications Engineering, 12 LNICST, 55–64.
https://doi.org/10.1007/978-3-642-03569-2_6
Hussain, A., Mkpojiogu, E. O. C., Almazini, H., & Almazini, H. (2017). Assessing the
usability of Shazam mobile app. AIP Conference Proceedings, 1891(October), 1–
6. https://doi.org/10.1063/1.5005390
Jaczyńska, M., Bobiński, P., & Pietrzak, A. (2018). Music Recognition Algorithms
Using Queries by Example. Proceedings of 2018 Joint Conference - Acoustics,
Acoustics 2018, 108–111. https://doi.org/10.1109/ACOUSTICS.2018.8502429
KALKHORAN, Sara; BENOWITZ, Neal L .; RIGOTTI, N. A. (2018). 乳鼠心肌提
取 HHS Public Access. Revista Del Colegio Americano de Cardiologia, 72(23),
2964–2979. https://doi.org/10.1109/jstsp.2018.2863189.A
Komulainen, S., Karukka, M., & Häkkilä, J. (2010). Social music services in teenage
life - A case study. ACM International Conference Proceeding Series, April,
364–367. https://doi.org/10.1145/1952222.1952303
Kusumawati, R. D., Oswari, T., Yusnitasari, T., Dutt, H., & Shukla, V. K. (2019). A
Comparison of Service Quality on Customer Satisfaction towards Music Product
Website in Indonesia and India. 2018 International Conference on Sustainable
Energy, Electronics and CoMputing System, SEEMS 2018, 5–8.
https://doi.org/10.1109/SEEMS.2018.8687377
Lam, H. L., Li, W. T. V., Laher, I., & Wong, R. Y. (2020). Effects of music therapy
on patients with dementia-A systematic review. Geriatrics (Switzerland), 5(3), 1–
14. https://doi.org/10.3390/GERIATRICS5040062
Mazumder, T. A., Student, M. S., Light, F., Networking, S., & Players, V. (2018).
Mobile Application and Its Global Impact 1. 06, 72–78.
Meier, L. M., & Manzerolle, V. R. (2019). Rising tides? Data capture, platform
accumulation, and new monopolies in the digital music economy. New Media
and Society, 21(3), 543–561. https://doi.org/10.1177/1461444818800998
Mohammadi, F., & Jahid, J. (2016). Comparing Native and Hybrid Applications with
focus on Features. 49.
Nagavi, T. C., & Bhajantri, N. U. (2018). A new approach to query by humming
55
based on modulated frequency features. Proceedings of the 2017 International
Conference on Wireless Communications, Signal Processing and Networking,
WiSPNET 2017, 2018-Janua, 1675–1679.
https://doi.org/10.1109/WiSPNET.2017.8300046
Rahman Ullah, P., & Khan, A. U. (2017). Role of FM Radio in Education (A Case
Study of FM Radio in Peshawar). J. Soc. Sci, 3(3), 9–16.
Vatolkina, N., Gorbashko, E., Kamynina, N., & Fedotkina, O. (2020). E-service
quality from attributes to outcomes: The similarity and difference between digital
and hybrid services. Journal of Open Innovation: Technology, Market, and
Complexity, 6(4), 1–21. https://doi.org/10.3390/joitmc6040143
Wang, Z., Cheng, B., & Chen, J. (2020). Enabling Ordinary Users Mobile
Development with Web Components. IEEE Access, 8, 1767–1776.
https://doi.org/10.1109/ACCESS.2019.2962393
56
57