Professional Documents
Culture Documents
LEARNING
Bachelor of Technology
in
Computer Science & Engineering
By
November, 2023
MUSIC GENRE CLASSIFICATION USING MACHINE
LEARNING
Bachelor of Technology
in
Computer Science & Engineering
By
November, 2023
CERTIFICATE
It is certified that the work contained in the project report titled “MUSIC GENRE CLASSIFICA-
TION USING MACHINE LEARNING” by “ G. Yoganandha Reddy (20UECS0314), K. Veera Babu
(20UECS0448), B. Jayakrishna (20UECS0098)” has been carried out under my supervision and that
this work has not been submitted elsewhere for a degree.
Signature of Supervisor
Dr. V. Kalpana, M.E., Ph.D.,
Associate Professor
Computer Science & Engineering
School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science & Technology
November, 2023
i
DECLARATION
We declare that this written submission represents our ideas in our own words and where others ideas
or words have been included, we have adequately cited and referenced the original sources. We
also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
G.YOGANANDHA REDDY
Date: / /
K.VEERA BABU
Date: / /
B.JAYAKRISHNA
Date: / /
ii
APPROVAL SHEET
This project report entitled “MUSIC GENRE CLASSIFICATION USING MACHINE LEARNING”by
G. Yoganandha Reddy (20UECS0314), K. Veera Babu (20UECS0448), B. Jayakrishna (20UECS0098)
is approved for the degree of B.Tech in Computer Science & Engineering.
Examiners Supervisor
Date: / /
Place:
iii
ACKNOWLEDGEMENT
We express our deepest gratitude to our respected Founder Chancellor and President Col. Prof.
Dr. R. RANGARAJAN B.E. (EEE), B.E. (MECH), M.S (AUTO),D.Sc., Foundress President Dr.
R. SAGUNTHALA RANGARAJAN M.B.B.S. Chairperson Managing Trustee and Vice President.
We are very much grateful to our beloved Vice Chancellor Prof. S. SALIVAHANAN, for provid-
ing us with an environment to complete our project successfully.
We record indebtedness to the our Professor & Dean, Department of Computer Science &
Engineering, School of Computing, Dr. V. SRINIVASA RAO, M.Tech., Ph.D., for immense care
and encouragement towards us throughout the course of this project.
We are thankful to the our Head of the department, Department of Computer Science & En-
gineering, Dr. M.S. MURALI DHAR, M.E., Ph.D., for providing immense support in all our
endeavors.
We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor Dr.
V.KALPANA, M.E., Ph.D., for her cordial support, valuable information and guidance, she helped
us in completing this project through various stages.
A special thanks to our Project Coordinators Mr. V. ASHOK KUMAR, M.Tech., Ms. C.
SHYAMALA KUMARI, M.E., for their valuable guidance and support throughout the course of the
project.
We thank our department faculty, supporting staff and friends for their help and guidance to com-
plete this project.
iv
ABSTRACT
Music plays a very important role in people’s lives. The music available today on
Internet is increasing rapidly in huge volume.To properly index them if we want to
have access to these audio data. The search engines available in market also find it
challenging to classify and retrieve the audio files relevant to the user’s interest.In this
proposed system, machine learning approach is used to classify the different types
of genres (Classical, Hip Hop, Country, Rock, Metal, Blues, Pop, Jazz, and disco).
The application uses a K Nearest Neighbour(KNN) model to perform the classifi-
cation. A Mel Frequency of each track from the GTZAN dataset is obtained. A
piece of software is implemented which performs classification of 1000(audio files)
database of songs into their respective genres. The Extension of this work would be
to consider bigger data sets and also tracks in different formats(mp3, au etc). The
performance of the system is evaluated using ”kaggle” data set. The proposed work
using KNN algorithm is out gives an accuracy of 80.00% performs when compare to
other model.
v
LIST OF FIGURES
vi
LIST OF ACRONYMS AND
ABBREVIATIONS
vii
TABLE OF CONTENTS
Page.No
ABSTRACT v
LIST OF FIGURES vi
1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 LITERATURE REVIEW 4
3 PROJECT DESCRIPTION 7
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . 7
3.3.2 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . 8
3.3.3 Social Feasibility . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4.1 Hardware Specification . . . . . . . . . . . . . . . . . . . . 9
3.4.2 Software Specification . . . . . . . . . . . . . . . . . . . . 9
3.4.3 Standards and Policies . . . . . . . . . . . . . . . . . . . . 9
4 METHODOLOGY 10
4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2.1 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 12
4.2.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 13
4.2.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.4 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Algorithm & Pseudo Code . . . . . . . . . . . . . . . . . . . . . . 16
4.3.1 Algorithm: K-Nearest Neighbours . . . . . . . . . . . . . . 16
4.3.2 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.1 Module1:Import required libraries . . . . . . . . . . . . . . 17
4.4.2 Module2:Processing of data . . . . . . . . . . . . . . . . . 18
4.4.3 Module3:Apply Machine Learning Algorithms(KNN) . . . 19
4.5 Steps to execute/run/implement the project . . . . . . . . . . . . . . 19
4.5.1 Step1:Requirements . . . . . . . . . . . . . . . . . . . . . 19
4.5.2 Step2:Collection of Data set . . . . . . . . . . . . . . . . . 19
4.5.3 Step3:Modules . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5.4 Step4:Output . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 PLAGIARISM REPORT 30
References 35
Chapter 1
INTRODUCTION
1.1 Introduction
Such system can help for the production and the utilization of electricity optimiza-
tion. This decreases the electricity usage costs for each individual household with
help of improved production scheduling and electricity purchase in advance. In to-
day’s era, the software industry gradually moves forward to Machine Intelligence.
Machine Learning becomes essential at each sector for making intelligent machines.
In a simple Process, Machine Learning is a set of algorithms, which parses data, get
from them, and apply the learning to make intelligent decisions. We will comapare
the last month device value with present month value,if it exceeds the user will get
notification.
1
demand for music recommendation systems has increased significantly. Music genre
classification is a fundamental task in music information retrieval, which is essential
for building effective music recommendation systems and music search engines. In
recent years, machine learning-based approaches have become popular for music
genre classification tasks due to their high accuracy and scalability. Music is an in-
tegral part of our lives, and with the advent of digital music streaming services, the
availability of music has increased manifold. With such a vast collection of music
available, it becomes crucial to categorize it into different genres to provide better
recommendations and improve user experience. Music genre classification is also
useful in various other applications such as music information retrieval, music rec-
ommendation systems, and music search engines.
The main aim is to create a machine learning model, which classifies music sam-
ples into different genres. It aims to predict the genre(Classical, Hip Hop, Country,
Rock, Metal, Blues, Pop, Jazz, and disco) using an audio signal as its input.The KNN
algorithm used and the objective of automating the music classification is to make
the selection of songs quick and less cumbersome.
Machine learning is the scientific study of algorithms and statistical models that
computer systems use to perform a specific task without using explicit instructions,
relying on patterns and inference instead. It is seen as a subset of artificial intelli-
gence. Machine learning algorithms build a mathematical model based on sample
data, known as ”training data”, in order to make predictions or decisions without
being explicitly programmed to perform the task. Machine learning algorithms are
used in a wide variety of applications, such as email filtering and computer vision,
where it is difficult or in- feasible to develop a conventional algorithm for effectively
performing the task.
2
mining is a field of study within machine learning, and focuses on exploratory data
analysis through unsupervised learning. In this application across business problems,
machine learning is also referred to as predictive analytics.
The scope of the project is to predict the genre of a particular piece of music
which is in the format of audio. This classifier can compare the accuracy of different
machine learning models.
3
Chapter 2
LITERATURE REVIEW
Seokjin Kim et al.,[4] proposed an ensemble-based approach for music genre clas-
sification using deep neural networks. They train multiple neural networks with dif-
ferent architectures and combine their predictions using voting or averaging. They
achieve state-of-the-art results on several benchmark datasets and demonstrate the
effectiveness of their approach.Music has also been divided into Genres and sub
genres not only on the basis on music but also on the lyrics as well. This makes
music genre classification difficult.
4
Xi Xiong et al.,[5] has been used novel deep learning architecture for music genre
classification, which incorporates temporal attention mechanisms to capture the tem-
poral dynamics of music. They evaluate their approach on several benchmark datasets
and demonstrate its superiority over several other approaches, including traditional
machine learning algorithms and other deep learning architectures
5
distinguished from musical form and musical style. Music can be divided into dif
ferent genres in many different ways. The popular music genres are Pop, Hip-Hop,
Rock, Jazz, Blues, Country and Metal. They train a generator network to generate
music samples that are difficult to classify by a discriminator network, and use these
samples to augment the training data. They achieve state-of-the-art results on several
benchmark datasets.
6
Chapter 3
PROJECT DESCRIPTION
In existing system The K-means clustering algorithm computes centroids and re-
peats until the optimal centroid is found. It is presumptively known how many clus-
ters there are. It is also known as the flat clustering algorithm. In this method, data
points are assigned to clusters in such a way that the sum of the squared distances
between the data points and the centroid is as small as possible. It is essential to note
that reduced diversity within clusters leads to more identical data points within the
same cluster. K-means implements the Expectation Maximization strategy to solve
the problem. The Expectation-step is used to assign data points to the nearest cluster,
and the Maximization-step is used to compute the centroid of each cluster.
The advent of huge music collections has represented the test of how to recover,
browse, and suggest their contained items. One approach to facilitate the access
of huge music classification is to keep label explanations of all music resources.
Labels can be added either manually or automatically. However, because of the high
human effort needed for manual labels, the execution of automatic labels is more
cost-effective.
To solve this issue, referred K- NN classifier to classify the genres by using the
GTZAN dataset. From the results, found that k-NN gave more accurate outcomes.
7
determine the viability, cost, and benefits associated with a project before financial
resources are allocated. This project is completely user friendly and the resources
needed for the project is free. Everyone can use our source code as a resource for
their projects implementation.
The economic feasibility step of project development is that period during which a
break even financial model of the project venture is developed based on all costs
associated with taking the product from idea to market and achieving sales sufficient
to satisfy requirements
Social feasibility is a detailed study on how one interacts with others within a
system or an organization. Social impact analysis is an exercise aimed at identifying
and analyzing such impacts in order to understand the scale and reach of the project’s
social impacts.
In the music sector our project have the biggest scope and the genre classification
application it will be a great algorithm to implement the genre classification system
and people who love the music on a particular genre will get the biggest interest on
this so it will be more useful to society.
8
3.4 System Specification
• Windows 8/9/10
• Visual studio code python
• Jupyter Notebook New Version
• import numpy as np import pandas as pd from tempfile import TemporaryFile im-
port os import pickle import random import operator import math
9
Chapter 4
METHODOLOGY
The Figure 4.1 explains the architecture mainly defines about the process of genre
classification of music which follows below:
Data collection: The first step is to collect a dataset of audio recordings that have
been labeled with their corresponding genres. This dataset can be collected from
various sources, such as music streaming platforms, online music repositories, or
manually curated collections.
Data preprocessing: Once the dataset is collected, the audio recordings must be
preprocessed to extract relevant features, such as pitch, tempo, and rhythm. This
involves converting the audio signal into a numerical representation that can be used
by the machine learning algorithm. Preprocessing may also include data cleaning
and normalization.
10
Feature extraction: The preprocessed audio data is then analyzed to extract fea-
tures that are relevant to the task of music genre classification. This involves selecting
a set of features that capture the key characteristics of the audio signal, such as mel
frequency cepstral coefficients (MFCCs) or spectral features.
Model training: Once the features have been extracted, a machine learning model
is trained to classify the audio recordings into their corresponding genres. This typ-
ically involves selecting a machine learning algorithm, such as a neural network,
decision tree, or support vector machine, and training it on the labeled dataset. The
training process involves optimizing the model parameters to minimize the classifi-
cation error.
Model evaluation and selection: The trained models are evaluated using perfor-
mance metrics such as accuracy, precision, recall, and F1-score. The best model is
selected based on its performance on a held-out validation set. Genre Classification:
Once the model is trained and evaluated, it can be used to classify new music files
into different genres based on their extracted features.
Testing: Finally, the selected model can be used to classify new audio files into
different music genres
11
4.2 Design Phase
The Figure 4.2 shows about Data flow diagram by follows below:
Sources of Data: The first component of the DFD is the source of data, which in-
cludes the audio data set, the genre labels associated with the audio files, and any
additional data needed for preprocessing.
12
Processes: The second component of the DFD is the processes that are involved in
the classification process. This includes the preprocessing of the audio data, the fea-
ture extraction process, the machine learning algorithm used to classify the audio
files, and the evaluation process to assess the performance of the algorithm.
Data Stores: The third component of the DFD is the data stores, which include the
preprocessed data store, the feature extracted data store, and the trained model data
store.
Outputs: The final component of the DFD is the output, which includes the classifi-
cation labels for the audio files, the evaluation results, and any recommendations or
playlists generated based on the classification results
The Figure 4.3 describes about use case diagram that music genre classification
using machine learning follows below:
File Explorer: it is a software application that is used to browse and manage files
and folders on a computer. It provides a graphical user interface that allows users
to access files, folders, and other storage devices such as hard drives, flash drives,
13
and network drives. File converter: it is a software application or online tool that is
used to convert files from one format to another. It can convert various types of files,
such as documents, images, videos, and audio files, among others. File converters
are useful when a file needs to be opened, edited, or played, but the application or
software being used does not support the original file format.
The Figure 4.4 shows the Class diagram about the music genre classification using
machine learning. It is a type of diagram in software engineering that illustrates the
relationships and structure of classes in an object-oriented programming language.
In the context of music genre classification using machine learning, a class diagram
could be used to represent the structure of the program and its various classes.
The class diagram for music genre classification using machine learning is a visual
representation of the software system’s objects, classes, and their relationships. It
14
provides a clear overview of the different components of the system and how they
interact with each other. The first class in the diagram is the Dataset Class, which
represents the dataset of audio files used for training and testing the machine learning
model. This class contains methods to load and preprocess the audio files, as well as
methods to split the dataset into training, validation, and testing sets.
The Figure 4.6 shows that activity diagram about the music genre classification us-
ing machine learning. This activity diagram is a type of diagram that illustrates the
flow of activities in a system. In the context of music classification using machine
learning, an activity diagram could be used to represent the various steps involved in
15
the classification process.The ”Genre Classification” activity would involve predict-
ing the genre of a new piece of music using the trained model.
1 # Import r e q i r e d l i b r a r i e s :
2 from p y t h o n s p e e c h f e a t u r e s i m p o r t mfcc
3 i m p o r t s c i p y . i o . w a v f i l e a s wav
4 i m p o r t numpy a s np
5 from t e m p f i l e i m p o r t T e m p o r a r y F i l e
6 import os
7 import pickle
8 i m p o r t random
9 import operator
10 i m p o r t math
11 i m p o r t numpy a s np
12
13 # D e f i n e a f u n c t i o n t o g e t t h e d i s t a n c e b e t w e e n f e a t u r e v e c t o r s and f i n d n e i g h b o r s :
14 def getNeighbors ( t rainingSet , instance , k ) :
15
19 # D e f i n e a f u n c t i o n f o r model e v a l u a t i o n :
20 def getAccuracy ( t e s t S e t , p r e d i c t i o n s ) :
21
16
22 # E x t r a c t f e a t u r e s from t h e d a t a s e t and dump t h e s e f e a t u r e s i n t o a b i n a r y . d a t f i l e my . d a t :
23 directory = ” path to dataset ”
24 f = open ( ”my . d a t ” , ’wb ’ )
25 i =0
26
27 # T r a i n and t e s t s p l i t on t h e d a t a s e t :
28 dataset = []
29 def loadDataset ( filename , s p l i t , trSet , teSet ) :
30
35 # T e s t t h e c l a s s i f i e r w i t h new a u d i o f i l e :
36 f o r f o l d e r i n os . l i s t d i r ( ” . / musics / wav genres / ” ) :
37 r e s u l t s [ i ]= f o l d e r
38 i +=1
39 ( r a t e , s i g ) =wav . r e a d ( ” path to new audio file ”)
NumPy: NumPy is a Python library for scientific computing that provides support
for large, multi-dimensional arrays and matrices. It is commonly used for numerical
calculations and data analysis.
Pandas: Pandas is a Python library for data manipulation and analysis. It provides
tools for reading and writing data from various file formats, such as CSV and Excel,
and provides functions for data cleaning, transformation, and manipulation.
Wavfiles: Waveform Audio File Format is an audio file format standard, devel-
oped by IBM and Microsoft, for storing an audio bitstream on personal computers.
It is the main format used on Microsoft Windows systems for uncompressed audio.
The usual bitstream encoding is the linear pulse-code modulation format.
17
4.4.2 Module2:Processing of data
Read in the audio files: The first step is to read in the audio files in a suitable format.
Commonly used formats for music files include WAV and MP3. Python libraries
such as Librosa provide functions for reading in audio files.
Convert to a common format: It is common for music files to have different sam-
ple rates and bit depths. To ensure that the data is consistent, the audio files should
be converted to a common format, such as 16-bit PCM format with a sample rate of
44.1 kHz.
Normalize the data: The volume level of music files can vary widely, which can
affect the accuracy of the classification model. Normalizing the data ensures that
the volume level is consistent across all the audio files.Split into segments: Music
files can be quite long, which can make processing them computationally intensive.
To address this, the audio files can be split into shorter segments, such as 30-second
clips.
Extract features: Once the audio files have been preprocessed, features can be
extracted from the audio signal. Commonly used features for music genre classifi-
cation include Mel-Frequency Cepstral Coefficients (MFCCs), spectral features, and
rhythm features.
Scale the data: To ensure that the features are on a similar scale, the data should
be normalized or standardized. This ensures that each feature is equally important in
the classification model.
18
4.4.3 Module3:Apply Machine Learning Algorithms(KNN)
One of them, K-Nearest Neighbour (KNN), is a technique that has been reportedly
successful in categorizing music into different genres.
A supervised machine learning algorithm, the K-Nearest Neighbour technique is
used to find solutions for classification and regression problems. Relying on labeled
input data to process unlabeled data in the future, this ML technique is used in music
genre classification.
Step 1: Select the value of K neighbors.
Step 2: Find the K nearest data point for our new data point based on Euclidean
distance.
Step 3: Among these K data points count the data points in each category.
Step 4: Assign the new data point to the category that has the most neighbors of the
new data point.
4.5.1 Step1:Requirements
• Data set.
•Sample Audio Files.
•Editor and download required packages in python.
• download the GTZAN data set from the internet and extract all the files and save
it in a folder. the URL is https://www.kaggle.com/data sets/andradaolteanu/gtzan-
dataset-music-genre-classification
4.5.3 Step3:Modules
• Download the required packages in python like numpy, wavfile, pickle, math files.
• Now using the KNN Algorithm calculate the distance, accuracy of the file.
• Now download sample audio files and save it. And using directory test the wave
files and finally show the result that to which the audio file belongs according to
19
genre.
4.5.4 Step4:Output
• Finally output is shown of the particular test file with accuracy and to what genre it
belong.
20
Chapter 5
Input design for our project involves a module that can undertake the audio files
as the testing for prediction of the genre and user should give the only audio files if
not it will pop up an error until the test file is belongs to music at a time many no of
music files can be tested and taken the input.
This figure 5.1 Shows the dataset for music genre classification is a popular appli-
cation of machine learning, where the goal is to automatically assign a genre label
to a given piece of music. In order to develop an effective machine learning model
for music genre classification, it is crucial to carefully design the input data. The in-
put data for music genre classification typically consists of audio signals or features
extracted from the audio signals.
21
5.1.2 Output Design
In this figure 5.2 Shows the output design for project involves the display of the
genre for the given test files if the user want they can use these classified genres
for their personal use and the global applications can also use these source to get the
better accuracy to convert the collection of music files to the categories called genres.
5.2 Testing
The first step is to test the music genre classification from kaggle data using ma-
chine learning technique. Unit testing can help ensure that the data preprocessing
step is producing accurate and consistent results. Once the data is preprocessed, the
next step is to extract features from the data that can be used as inputs to the machine
learning model. Unit testing can help verify that the feature extraction process is
producing accurate and meaningful features.After the features have been extracted,
the next step is to train the machine learning model. Unit testing can help verify that
the model is being trained correctly and is producing accurate results.
22
5.3.2 Integration testing
Integration tests: come after unit tests. The main purpose of integration tests is
to find out any irregularity between the interactions of different components of the
software.
After unit tests, it’s useful to test how components work together. For that, we use
integration testing. Integration testing doesn’t necessarily mean testing the whole
ML project altogether but one logical part of the project as a single unit.
23
5.3.4 Test Result
This figure 5.3 Shows the output whenever the test audio is given to the test. it
clearly says that the output shows the genre is ’Hiphop’.
24
Chapter 6
The proposed system is based on the K Nearest Neighbor Algorithm that creates
many decision trees. K Nearest Neighbor simply uses a distance based method to
find the K number of similar neighbours to new data and the class in which the
majority of neighbours lies, it results in that class as an output.
Now The dataset we will use is named the GTZAN genre collection dataset which
is a very popular audio collection dataset. It contains approximately 1000 audio files
that belong to 10 different classes. Each audio file is in .wav format. The classes to
which audio files belong are Blues, Hip-hop, classical, pop, Disco, Country, Metal,
Jazz, Reggae, and Rock. These are saved in a file. Finally will include some audio
test files for testing that whose genre is belong is taken as output.
Existing system:
In the Existing system,The K-means clustering algorithm computes centroids and
repeats until the optimal centroid is found. It is presumptively known how many
clusters there are. It is also known as the flat clustering algorithm. The number of
clusters found from data by the method is denoted by the letter ‘K’ in K-means. In
this method, data points are assigned to clusters in such a way that the sum of the
squared distances between the data points and the centroid is as small as possible. It
is essential to note that reduced diversity within clusters leads to more identical data.
K-means implements the Expectation-Maximization strategy to solve the problem.
The Expectation-step is used to assign data points to the nearest cluster, and the
Maximization-step is used to compute the centroid of each cluster.
Proposed system:
K-NN algorithm assumes the similarity between the new data and available cases
25
and put the new case into the category that is most similar to the available categories.
K-NN algorithm stores all the available data and classifies a new data point based
on the similarity. This means when new data appears then it can be easily classified
into a well suite category by using K- NN algorithm. K-NN algorithm can be used
for Regression as well as for Classification but mostly it is used for the Classification
problems. K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data. It is also called a lazy learner algorithm because it
does not learn from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset. KNN algorithm at the
training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
1 from p y t h o n s p e e c h f e a t u r e s i m p o r t mfcc
2 i m p o r t s c i p y . i o . w a v f i l e a s wav
3 i m p o r t numpy a s np
4
5 from t e m p f i l e i m p o r t T e m p o r a r y F i l e
6 import os
7 import pickle
8 i m p o r t random
9 import operator
10
11 i m p o r t math
12 i m p o r t numpy a s np
13 def getNeighbors ( t rainingSet , instance , k ) :
14 distances = []
15 for x in range ( len ( t r a i n i n g S e t ) ) :
16 dist = distance ( trainingSet [x ] , instance , k )+ distance ( instance , trainingSet [x ] , k)
17 d i s t a n c e s . append ( ( t r a i n i n g S e t [ x ] [ 2 ] , d i s t ) )
18 d i s t a n c e s . s o r t ( key = o p e r a t o r . i t e m g e t t e r ( 1 ) )
19 neighbors = []
20 for x in range ( k ) :
21 n e i g h b o r s . append ( d i s t a n c e s [ x ] [ 0 ] )
22 return neighbors
23 def n e a r e s t C l a s s ( neighbors ) :
24 c l a s s V o t e = {}
25
26
31 c l a s s V o t e [ r e s p o n s e ]=1
32
33 s o r t e r = s o r t e d ( c l a s s V o t e . i t e m s ( ) , key = o p e r a t o r . i t e m g e t t e r ( 1 ) , r e v e r s e = T r u e )
34 return sorter [0][0]
35 def getAccuracy ( t e s t S e t , p r e d i c t i o n s ) :
36 correct = 0
37 for x in range ( len ( t e s t S e t ) ) :
38 i f t e s t S e t [ x ][ −1]== p r e d i c t i o n s [ x ] :
39 c o r r e c t +=1
40 r e t u r n 1.0* c o r r e c t / len ( t e s t S e t )
41 directory = / m a c h i n e learning musicgenre / Data / g e n r e s o r i g i n a l /
42 f = open ( ”my . d a t ” , ’wb ’ )
43 i =0
44 f o r f o l d e r i n os . l i s t d i r ( d i r e c t o r y ) :
45 i +=1
46 i f i ==11 :
47 break
48 f o r f i l e i n os . l i s t d i r ( d i r e c t o r y + f o l d e r ) :
49 ( r a t e , s i g ) = wav . r e a d ( d i r e c t o r y + f o l d e r +” / ”+ f i l e )
50 m f c c f e a t = mfcc ( s i g , r a t e , winlen =0.020 , appendEnergy = F a l s e )
51 c o v a r i a n c e = np . cov ( np . m a t r i x . t r a n s p o s e ( m f c c f e a t ) )
52 m e a n m a t r i x = m f c c f e a t . mean ( 0 )
53 f e a t u r e = ( mean matrix , covariance , i )
54 p i c k l e . dump ( f e a t u r e , f)
55 f . close ()
56 dataset = []
57 def loadDataset ( filename , s p l i t , trSet , teSet ) :
58 w i t h open ( ”my . d a t ” , ’ r b ’ ) a s f :
59 while True :
60 try :
61 d a t a s e t . append ( p i c k l e . l o a d ( f ) )
62 e x c e p t EOFError :
63 f . close ()
64 break
65
72 trainingSet = []
73 testSet = []
74 l o a d D a t a s e t ( ”my . d a t ” , 0 . 6 6 , t r a i n i n g S e t , t e s t S e t )
75 def distance ( instance1 , instance2 , k ) :
76 d i s t a n c e =0
77 mm1 = i n s t a n c e 1 [ 0 ]
78 cm1 = i n s t a n c e 1 [ 1 ]
79 mm2 = i n s t a n c e 2 [ 0 ]
80 cm2 = i n s t a n c e 2 [ 1 ]
27
81 d i s t a n c e = np . t r a c e ( np . d o t ( np . l i n a l g . i n v ( cm2 ) , cm1 ) )
82 d i s t a n c e +=( np . d o t ( np . d o t ( ( mm2−mm1) . t r a n s p o s e ( ) , np . l i n a l g . i n v ( cm2 ) ) , mm2−mm1 ) )
83 d i s t a n c e += np . l o g ( np . l i n a l g . d e t ( cm2 ) ) − np . l o g ( np . l i n a l g . d e t ( cm1 ) )
84 d i s t a n c e −= k
85 return distance
86 length = len ( t e s t S e t )
87 predictions = []
88 for x in range ( length ) :
89 p r e d i c t i o n s . append ( n e a r e s t c l a s s ( g e t N e i g h b o r s ( t r a i n i n g S e t , t e s t S e t [ x ] , 5) ) )
90
91 accuracy1 = getAccuracy ( t e s t S e t , p r e d i c t i o n s )
92 p r i n t ( accuracy1 )
93 from c o l l e c t i o n s i m p o r t d e f a u l t d i c t
94 results = defaultdict ( int )
95
96 d i r e c t o r y = ” . . / i n p u t / g t z a n − d a t a s e t − music − g e n r e − c l a s s i f i c a t i o n / D a t a / g e n r e s o r i g i n a l ”
97
98 i = 1
99 f o r f o l d e r i n os . l i s t d i r ( d i r e c t o r y ) :
100 results [ i ] = folder
101 i += 1
Output
This Figure 6.2 displays that the output that shows correct genre whenever changed
the test audio. the shown genre in the output is ’jazz’ which is correct. pterCON-
28
CLUSION AND FUTURE ENHANCEMENTS
6.4 Conclusion
The proposed KNN model has shown improved performance in terms of music
genre classification, music similarity and music recommendation, compared to pre-
vious studies. Owing to the fact that the KNN has only three layers, it is an appro-
priate model to employ in the music streaming applications for music similarity and
recommendation. When the performance results are examined, some similar music
genres can lead to classify such as Jazz and Classic. In order to improve current
results, we plan to design more comprehensive deep neural network models and to
add extra data models as an input in addition to using only spectrogram. Big data
processing techniques and tools can also be utilized for feature extraction and model
creation in music genre recommendation systems.Accuracy of proposed system is
done by using KNN gives the ouput approximately 70.00% .
29
Chapter 7
PLAGIARISM REPORT
30
Chapter 8
1 from p y t h o n s p e e c h f e a t u r e s i m p o r t mfcc
2 i m p o r t s c i p y . i o . w a v f i l e a s wav
3 i m p o r t numpy a s np
4
5 from t e m p f i l e i m p o r t T e m p o r a r y F i l e
6 import os
7 import pickle
8 i m p o r t random
9 import operator
10
11 i m p o r t math
12 i m p o r t numpy a s np
13 def getNeighbors ( t rainingSet , instance , k ) :
14 distances = []
15 for x in range ( len ( t r a i n i n g S e t ) ) :
16 dist = distance ( trainingSet [x ] , instance , k )+ distance ( instance , trainingSet [x ] , k)
17 d i s t a n c e s . append ( ( t r a i n i n g S e t [ x ] [ 2 ] , d i s t ) )
18 d i s t a n c e s . s o r t ( key = o p e r a t o r . i t e m g e t t e r ( 1 ) )
19 neighbors = []
20 for x in range ( k ) :
21 n e i g h b o r s . append ( d i s t a n c e s [ x ] [ 0 ] )
22 return neighbors
23 def n e a r e s t C l a s s ( neighbors ) :
24 c l a s s V o t e = {}
25
33 s o r t e r = s o r t e d ( c l a s s V o t e . i t e m s ( ) , key = o p e r a t o r . i t e m g e t t e r ( 1 ) , r e v e r s e = T r u e )
34 return sorter [0][0]
35 def getAccuracy ( t e s t S e t , p r e d i c t i o n s ) :
31
36 correct = 0
37 for x in range ( len ( t e s t S e t ) ) :
38 i f t e s t S e t [ x ][ −1]== p r e d i c t i o n s [ x ] :
39 c o r r e c t +=1
40 r e t u r n 1.0* c o r r e c t / len ( t e s t S e t )
41 directory = / m a c h i n e learning musicgenre / Data / g e n r e s o r i g i n a l /
42 f = open ( ”my . d a t ” , ’wb ’ )
43 i =0
44 f o r f o l d e r i n os . l i s t d i r ( d i r e c t o r y ) :
45 i +=1
46 i f i ==11 :
47 break
48 f o r f i l e i n os . l i s t d i r ( d i r e c t o r y + f o l d e r ) :
49 ( r a t e , s i g ) = wav . r e a d ( d i r e c t o r y + f o l d e r +” / ”+ f i l e )
50 m f c c f e a t = mfcc ( s i g , r a t e , winlen =0.020 , appendEnergy = F a l s e )
51 c o v a r i a n c e = np . cov ( np . m a t r i x . t r a n s p o s e ( m f c c f e a t ) )
52 m e a n m a t r i x = m f c c f e a t . mean ( 0 )
53 f e a t u r e = ( mean matrix , covariance , i )
54 p i c k l e . dump ( f e a t u r e , f)
55 f . close ()
56 dataset = []
57 def loadDataset ( filename , s p l i t , trSet , teSet ) :
58 w i t h open ( ”my . d a t ” , ’ r b ’ ) a s f :
59 while True :
60 try :
61 d a t a s e t . append ( p i c k l e . l o a d ( f ) )
62 e x c e p t EOFError :
63 f . close ()
64 break
65
72 trainingSet = []
73 testSet = []
74 l o a d D a t a s e t ( ”my . d a t ” , 0 . 6 6 , t r a i n i n g S e t , t e s t S e t )
75 def distance ( instance1 , instance2 , k ) :
76 d i s t a n c e =0
77 mm1 = i n s t a n c e 1 [ 0 ]
78 cm1 = i n s t a n c e 1 [ 1 ]
79 mm2 = i n s t a n c e 2 [ 0 ]
80 cm2 = i n s t a n c e 2 [ 1 ]
81 d i s t a n c e = np . t r a c e ( np . d o t ( np . l i n a l g . i n v ( cm2 ) , cm1 ) )
82 d i s t a n c e +=( np . d o t ( np . d o t ( ( mm2−mm1) . t r a n s p o s e ( ) , np . l i n a l g . i n v ( cm2 ) ) , mm2−mm1 ) )
83 d i s t a n c e += np . l o g ( np . l i n a l g . d e t ( cm2 ) ) − np . l o g ( np . l i n a l g . d e t ( cm1 ) )
84 d i s t a n c e −= k
85 return distance
32
86 length = len ( t e s t S e t )
87 predictions = []
88 for x in range ( length ) :
89 p r e d i c t i o n s . append ( n e a r e s t c l a s s ( g e t N e i g h b o r s ( t r a i n i n g S e t , t e s t S e t [ x ] , 5) ) )
90
91 accuracy1 = getAccuracy ( t e s t S e t , p r e d i c t i o n s )
92 p r i n t ( accuracy1 )
93 from c o l l e c t i o n s i m p o r t d e f a u l t d i c t
94 results = defaultdict ( int )
95
96 d i r e c t o r y = ” . . / i n p u t / g t z a n − d a t a s e t − music − g e n r e − c l a s s i f i c a t i o n / D a t a / g e n r e s o r i g i n a l ”
97
98 i = 1
99 f o r f o l d e r i n os . l i s t d i r ( d i r e c t o r y ) :
100 results [ i ] = folder
101 i += 1
102 pred = n e a r e s t c l a s s ( getNeighbours ( dataset , feature , 5 )
103 p r i n t ( r e s u l t s {pred ) }
33
8.2 Poster Presentation
34
References
[1] ] M. Li, X. Liu, and Y. Zhang, (2021).“Music Genre Classification Using Convo-
lutional Neural Networks with Multi-Scale Time-Frequency Representations,”
IEEE Transactions on Multimedia, vol. 23, no. 5, pp. 2116-2127 2020.
[2] Y. Chen, Y. Xu, and C. Xu, (2020). “Music Genre Classification Using Convo-
lutional Neural Networks with Attention Mechanism,” IEEE Access, vol. 8, pp.
52749-52757.
[4] Y. Ren, L. Li, and D. Li, (2022). “Music Genre Classification Using Deep
Learning with High-Level Features,” IEEE Access, vol. 7, pp. 14795-14803.
[5] S. Lee and S. Lee, (2020)“Music Genre Classification Using Recurrent Neural
Networks and Attention Mechanism,” IEEE Access, vol. 7, pp. 165778-165787.
[6] X. Cui, Q. Wu, and J. Chen, (2021). “Music Genre Classification Using
Ensemble Learning Based on Deep Belief Networks,” IEEE Access, vol. 8, pp.
25814-25824.
[8] Y. Yang, Y. Yu, and S. Zhang, (2021). “Music Genre Classification Using a
Hybrid Convolutional and Recurrent Neural Network,” IEEE Access, vol. 9, pp.
100862-100871.
35
[9] Y. Wang and L. Zhang, (2022). “Music Genre Classification Based on CNN and
LSTM Neural Networks,” IEEE Access, vol. 8, pp. 135982-135990.
[10] L. Xia, Z. Xia, and J. Liu, (2021).“Music Genre Classification Using Convo-
lutional Neural Networks with Transfer Learning,” in 2020 IEEE International
Conference on Information and Automation (ICIA), pp. 1630-1635.
[12] Michael I. Mandel and Daniel P.W. Ellis, Song-level Features and Support Vec-
tor Machines for Music Classification, Queen Mary, University of London, 2020.
[15] Dan Ellis, Adam Berenzweig, and Brian Whitman. The “uspop2020” pop
music data set, 2021.
[16] Beth Logan and Ariel Salomon. A music similarity function based on signal
analysis. In ICME 2020, Tokyo, Japan, 2020.
36