Professional Documents
Culture Documents
PLAGIARISM DETECTION
A Project report submitted in partial fulfillment of the requirements for
the award of the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGINEERING
Submitted by
Sheema Patro - 317126510169
We express our thanks to Project Coordinator Dr.V.Usha Bala, for her Continuous
support and encouragement. We thank all teaching faculty of Department of CSE,
whose suggestions during reviews helped us in accomplishment of our project. We
would like to thank S. Sajahan of the Department of CSE, ANITS for providing great
assistance in accomplishment of our project.
We would like to thank our parents, friends, and classmates for their encouragement
throughout our project period. At last but not the least, we thank everyone for
supporting us directly or indirectly in completing this project successfully.
PROJECT STUDENTS:
CERTIFICATE
This is to certify that the project report entitled “Music Recommendation System
With Plagisarism Detection”, submitted by Sheema Patro (317126510169),
P.N.V.S. Siva Dhanush (317126510157), G. Sai Mahesh (317126510141) in partial
fulfillment of the requirements for the award of the degree of Bachelor of
Technology in Computer Science Engineering of Anil Neerukonda Institute of
Technology and Sciences (A), Visakhapatnam is a record of bonafide work carried
out under my guidance and supervision.
ABSTRACT
LIST OF SYMBOLS
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATIONS
1. INTRODUCTION 1
1.1. Music Plagiarism 2
1.2. Motivational Work 3
1.3. Problem Statement 3
2. LITERATURE SURVEY 4
2.1. Introduction 4
2.2. Existing Method 5
2.2.1. Collaboration filtering 5
2.2.2. Memory-Based Collaborative Filtering(Neighbourhood based) 5
2.2.3. Model-based CF 6
2.2.4. Hybrid Collaborative Filtering Techniques 6
2.2.5. Content Based Recommender System 6
2.2.6 Context Based Recommender System 7
2.2.7. A recommender System Based on Genetic Algorithm 8
2.2.8. A Personalized Music Recommendation System Using 9
Convolutional Neural Networks Approach
2.2.9. Affective Music Recommendation System Reflecting the Mood of Input
Image 9
3. METHODOLOGIES AND ARCHITECTURE 10
3.1. Machine Learning 10
3.2. Python 13
3.3. Jupyter Notebook 18
3.4. Google colab 19
3.5. Colloborative filtering 19
3.6. System Architecture 20
3.7. Content -based 21
4. UML DIAGRAMS 23
4.1. Use case Diagram 24
4.2. Sequence Diagram 26
4.3. Activity Diagram 27
5. MODULES DIVISION 29
5.1. Colloborative Filtering 29
5.2. Content Based and Plagiarism Detection 29
5.3. Mood Prediction 29
6. ALGORITHMS 30
6.1. Colloborative Filtering Algorithms 30
6.2. Plagiarism Algorithms 34
6.3. Mood Prediction Algorithms 36
7. INPUT AND OUTPUT 39
8. CODE IMPLEMENTATION 41
8.1. Colloborative Filtering 41
8.1.1. Graphs
8.2. Content and Plagiarism 50
8.3. Mood Prediction 52
9. FUNCTIONAL REQUIREMENTS 57
10. NON-FUNCTIONAL REQUIREMENTS 57
11. DATASET 58
12. CONCLUSION 59
13. FUTURE WORK 59
14. REFERENCES 60
LIST OF SYMBOLS
Symbol Use
LIST OF FIGURES
LIST OF ABBREVATIONS
With the explosion of network in the past decades, internet has become the
major source of retrieving multimedia information such as video, books, and music etc.
People have considered that music is an important aspect of their lives and they listen
to music, an activity they engaged in frequently. However, the problem now is to
organize and manage the millions of music titles produced by society. A good music
recommender system should be able to automatically detect preferences and generate
playlists accordingly. The proposed system is to detect music plagiarism based on
music similarity. The plagiarism system extracts the music from input and finds music
that are close to the query music which the query has plagiarized. Meanwhile, the
development of recommender systems provides a great opportunity for industry to
aggregate the users who are interested in music. We need to generate the best music
recommendation system which is need to predict based on customization, by using
KNN, Machine Learning.
Everyones taste in music is unique which means that no matter what music
you make, someone is bound to enjoy listening to it. While the Music industry may
favor certain types of music more than others, it is important to understand that there
isnt a single human culture on earth that has existed without music. Music is of great
benefit to us, regardless of whether we are renowned recording artists, karaoke singers
or merely fans of music. The number of songs available exceeds the listening capacity
of single individual.
1
To get an idea, there are 4 million songs on Spotify that have never been
played. In total, there must be billions just there and Spotify itself is by no means the
limit of music. What about all the CDs and records made over the past century which
have not been digitized? What, indeed, about song passed down the generations in
small African communities? There are trillions and trillions of songs in the world, so
many that an estimate is impossible, and the potential more an infinitely greater
number which have not yet been made, a world of music for us to enjoy.
The music recommender systems are double edged swords. The are of
valuable use both to the user as well as the provider. They keep the user engaged by
finding interesting music in the form of recommendations, lessening the burden on the
user by reducing the set of choices to choose from. They give the scope for exploration
and discovery of music that the user may not know exists. Because it is a music
recommender there is never less entertainment.
2
Module: music of the query music is extracted,(2)Similarity Calculation Modules: the
similarity between the sequence of the input polyphonic music and those of music in
the database is calculated,(3) The similar section of the music in the database is
detected.This can be done by using content-based filtering,The model only learns to
recommend items of the same type that the user is already using or, in our case,
listening to. Even though this could be helpful, the value of that recommendation is
significantly less because it lacks the surprise component of discovering something
completely new.
3
2. LITERATURE SURVEY
2.1. Introduction
An ideal music recommender system should be able to automatically
recommend personalised music to human listeners.So far, many music discovery
websites such as Last.fm, All music, Pandora, Audio baba Mog, Spotify, Apple Genius,
have aggregated millions of users, and the development is explosive . In this section,
we present the most popular approaches, metadata information retrieval, collaborative
filtering, content-based information retrieval, emotion-based model , context-based
information retrieval and hybrid models .
4
the distance between songs are measured. Three typical similarity measurements are
K-means clustering with Earth-Mover’s Distance, Expectation-Maximization with
Monte Carlo Sampling, Average Feature Vectors with Euclidean Distance.
Digitization of music has led to easier access to different forms music across
the globe. Increasing work pressure denies the necessary time to listen and evaluate
music for a creation of a personal music library. One solution might be developing a
music search engine or recommendation system based on different moods. Develop a
mood classification system from lyrics as well by combining a wide range of semantic
and stylistic features extracted from textual lyrics.
Collaborative filtering is considered the most basic and the easiest method to
find recommendations and make predictions regarding the sales of a product. It does
have some disadvantages which has led to the development of new methods and
techniques.
5
2.2.3. Model-based CF
Complex patters which are based on training data, are the models (such as
data mining algorithms, machine learning) and then intelligent predictions are made
for CF tasks for the real world data which arebased on learnt models. It intuitive
rationale for recommendations. Model disadvantage of model-based CF is that it loses
useful information for dimensionality reduction techniques .
Various problems like cold-start, data sparsity and scalability can be avoided
by using hybrid approach [40].There are different ways of combining CF with other
recommender techniques which are following:
• Hybrid Recommenders Incorporating CF and Content-Based Features
• Hybrid Recommenders Combining CF and Other Recommender Systems
• Hybrid Recommenders Combining CF Algorithms .
6
2.2.6. Context Based Recommender System
Extending the user/item convention to the circumstances of the user to
incorporate the contextual information is what is achieved in context-based
recommender systems [15]. This helps to abandon the cumbersome process of making
the user fill a huge number of personal details.
The user's location data, social data, current time, weather data is taken into
consideration as the contextual data and is given as input to the system. An
approximate address of the user is determined and the location is saved. Social data of
a user can be accessed by requesting permission to a social account of the user.
Contextual factors are of two types: Dynamic and static, depending on whether they
change with time or not.
1. Dynamic: When the contextual factors change over time and hence unstable. They
may change by explicit user feedback. User feedback is generally used for refining the
profile of user to get better results of recommendations. The biggest challenge is that if
a system is considered to be dynamic then the system should be able to find out when
to switch to a different underlying context model.
2. Static: The contextual factors don’t change over time and hence stable. For e.g. to
buy a cell phone the contextual factors can be Time, purpose of purchasing and only
them while entire purchasing purpose recommendation application runs.
7
1. Fully observable: Complete structure and values of contextual factors are known
explicitly, at the time when recommendations are made.
2. Partially observable: Some of the information is known explicitly about the
contextual factors.
3. Unobservable: There is no information of contextual factors explicitly available in
It.
2.2.7.1. Advantages
The experimental results exhibited that the average scores, which are
objectively collected by means of user evaluations, increases by degrees as the
generation grows.
2.2.7.2. Limitations
It’s really hard for people to come up with a good heuristic which actually
reflects what we want the algorithm to do. It might not find the most optimal solution
to the defined problem in all cases.
8
2.2.8. A Personalized Music Recommendation System Using
Convolutional Neural Networks Approach
A Personalized Music Recommendation System Using Convolutional Neural
Networks Approach”, they have used CNN to classify the music and generate a log file
and used CF to provide recommendations. There paper suggested that using traditional
classifiers such as SVN or KNN can reduce efficiency when applied on complex data.
9
3. METHODOLOGIES AND ARCHITECTURE
A machine learning model is the output of the training process and is defined
as the mathematical representation of the real-world process. The machine learning
algorithms find the patterns in the training dataset, which is used to approximate the
target function and is responsible for mapping the inputs to the outputs from the
available dataset. These machine learning methods depend upon the type of task and
are classified as Classification models, Regression models, Clustering, Dimensionality
Reductions, Principal Component Analysis, etc.
10
The algorithm then finds relationships between the parameters given,
essentially establishing a cause and effect relationship between the variables in the
dataset. At the end of the training, the algorithm has an idea of how the data works and
the relationship between the input and the output.
This solution is then deployed for use with the final dataset, which it learns
from in the same way as the training dataset. This means that supervised machine
learning algorithms will continue to improve even after being deployed, discovering
new patterns and relationships as it trains itself on new data.
In supervised learning, the labels allow the algorithm to find the exact nature
of the relationship between any two data points. However, unsupervised learning does
not have labels to work off of, resulting in the creation of hidden structures.
Relationships between data points are perceived by the algorithm in an abstract
manner, with no input required from human beings.
11
Based on the psychological concept of conditioning, reinforcement learning
works by putting the algorithm in a work environment with an interpreter and a reward
system. In every iteration of the algorithm, the output result is given to the interpreter,
which decides whether the outcome is favorable or not.
In case of the program finding the correct solution, the interpreter reinforces
the solution by providing a reward to the algorithm. If the outcome is not favorable,
the algorithm is forced to reiterate until it finds a better result. In most cases, the
reward system is directly tied to the effectiveness of the result.
These algorithms normally undertake labeled and unlabeled data, where the
unlabelled data amount is large as compared to labeled data. As it works with both and
in between supervised and unsupervised learning algorithms, therefore is called semi-
supervised machine learning. Systems using these models are seen to have improved
learning accuracy.
3.1.5. Classification
There is a division of classes of the inputs; the system produces a model from
training data wherein it assigns new inputs to one of these classes.
It falls under the umbrella of supervised learning. A real-life example can be spam
filtering, where emails are the input that is classified as “spam” or “not spammed”.
3.1.6. Regression
Regression algorithm also is a part of supervised learning, but the difference
being that the outputs are continuous variables and not discrete.
We have used unsupervised machine learning and KNN algorithm can be used for
both classification and regression problems. The KNN algorithm uses 'feature
12
similarity' to predict the values of any new data points. This means that the new point
is assigned a value based on how closely it resembles the points in the training set.
3.2. Python
What exactly is Python? You may be wondering about that. You may be
referring to this book because you wish to learn editing but are not familiar with
editing languages. Alternatively, you may be familiar with programming languages
such as C, C ++, C #, or Java and wish to learn more about Python language and how it
compares to these "big word" languages.
You can skip to the next chapter if you are not interested in how and why
Python. In this chapter, I will try to explain why I think Python is one of the best
programming languages available and why it is such a great place to start.
Python interactive - To write your own applications, you can sit in Python
Prompt and communicate directly with the interpreter.
13
3.2.2. Python Features
Python features include -
Easy to read - Python code is clearly defined and visible to the naked eye.
Standard General Library - Python's bulk library is very portable and shortcut
compatible with UNIX, Windows, and Macintosh.
Portable - Python works on a variety of computer systems and has the same
user interface for all.
GUI Programming - Python assists with the creation and installation of a user
interface for images of various program phones, libraries, and applications, including
Windows MFC, Macintosh, and Unix's X Window.
Scalable - Major projects benefit from Python building and support, while Shell
writing is not.
Aside from the characteristics stated above, Python offers a long list of useful features,
some of which are described below. −
2.It can be used as a scripting language or compiled into byte-code for large-scale
application development.
14
3.It allows dynamic type verification and provides very high-level dynamic data types.
3.2.3. Datatypes
1.Numbers
2.String
3.List
4.Tuple
5.Dictionary
15
3.2.7. Python Tuples
A cone is a type of data type similar to a sequence of items. Cone is a set of
values separated by commas. The pods, unlike the list, are surrounded by parentheses.
Lists are placed in parentheses ([]), and the elements and sizes can be
changed, but the lumps are wrapped in brackets (()) and cannot be sorted. Powders are
the same as reading lists only.
The dictionary key can be any type of Python, but numbers and strings are
very common. Prices, on the other hand, can be anything you choose Python.
Curly braces () surround dictionaries, and square braces ([) are used to assign
and access values.
Interactive mode is a command line shell that provides instant response for
each statement while simultaneously running previously provided statements in active
memory. The feed programme is assessed in part and whole as fresh lines are fed into
the interpreter.
16
3.2.10. Pandas
Pandas is an open source library in Python. It provides ready to use high-
performance data structures and data analysis tools. Pandas module runs on top of
NumPy and it is popularly used for data science and data analytics. It allows us to store
and manipulate tabular data as a 2-D data structure.
Pandas is the most popular python library that is used for data analysis. It
provides highly optimized performance with back-end source code is purely written in
C or Python. We can analyze data in pandas with: Series.
Pandas are also able to delete rows that are not relevant, or contains wrong values, like
empty or NULL values. This is called cleaning the data.
3.2.11. Numpy
NumPy is the fundamental package for scientific computing
in Python. NumPy arrays facilitate advanced mathematical and other types of
operations on large numbers of data. Typically, such operations are executed more
efficiently and with less code than is possible using Python's built-in sequences.
17
In Python we have lists that serve the purpose of arrays, but they are slow to
process.NumPy aims to provide an array object that is up to 50x faster than traditional
Python lists.
NumPy arrays are stored at one continuous place in memory unlike lists, so
processes can access and manipulate them very efficiently.This behavior is called
locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized
to work with latest CPU architectures.NumPy is a Python library and is written
partially in Python, but most of the parts that require fast computation are written in C
or C++.
The Pandas module mainly works with the tabular data, whereas
the NumPy module works with the numerical data. ... NumPy library provides objects
for multi-dimensional arrays, whereas Pandas is capable of offering an in-memory 2d
table object called DataFrame. NumPy consumes less memory as compared to Pandas.
The Jupyter Notebook is not included with Python, so if you want to try it
out, you will need to install Jupyter. There are many distributions of
the Python language. This article will focus on just two of them for the purposes of
installing Jupyter Notebook.
18
3.4. Google Colab
Colaboratory, or “Colab” for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education.
3.Create/Upload/Share notebooks.
The aim of this algorithm is to learn a function that can predict if a user will
benefit from an item — meaning the user will likely listen to a song.This can be done
by using rating. There are two ways to collect user ratings: Explicit Rating and Implicit
Rating. We used K-Nearest Neighbors Algorithm.
19
1. Explicit Rating
This means we explicitly ask the user to give a rating. This represents the
most direct feedback from users to show how much they like a song.
2. Implicit Rating
We examine whether or not a user listened to a song, for how long or how
many times, which may suggest that he/she liked that particular song.
Data cleaning is the process of detecting and correcting (or removing) corrupt
or inaccurate records from a record set, table, or database and refers to identifying
incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing,
modifying, or deleting the dirty or coarse data.
20
3.6.3. Designing Implementing And Evaluating The Music
Recommender System Application
Let a user enter a song title and the system will find a song which has the
most similar features. After calculating similarity and sorting the scores in descending
order, I find the corresponding songs of 5 highest similarity scores and return to users.
In this problem, we have to build a linear classifier model to predict the songs.
This process should be followed once the dataset is preprocessed: data cleaning and
data transformation. The DNN model will be built using python3 and tensorflow 1.3.0.
21
representation.Second, define a similarity function among these object representations
which mimics what human understands as an item-item similarity.Because we are
working with text and words, Term Frequency-Inverse Document Frequency (TF-IDF)
can be used for this matching process.
22
4. UML DIAGRAMS
The kind of the diagram is defined by the primary graphical symbols shown
on the diagram. For example, a diagram where the primary symbols in the contents
area are classes is class diagram. A diagram which shows use cases and actors is use
case diagram. A sequence diagram shows sequence of message exchanges between
lifelines.
Structure diagrams show the static structure of the system and its parts on
different abstraction and implementation levels and how they are related to each other.
The elements in a structure diagram represent the meaningful concepts of a system,
and may include abstract, real world and implementation concepts.
23
4.1. Use-Case Diagram
A use-case model describes a system's functional requirements in terms of
use cases. It is a model of the system's intended functionality (use cases) and its
environment (actors). Use cases enable you to relate what you need from a system to
how the system delivers on those needs. Because it is a very powerful planning
instrument, the use-case model is generally used in all phases of the development cycle
by all team members
An effective use case diagram can help your team discuss and represent:
2. Goals that your system or application helps those entities (known as actors) achieve.
3. The scope of your system.
24
4.1.1. Use Case Diagram for Colloborative Filtering(Fig 2)
25
4.2. Sequence Diagram
Sequence diagrams can be useful references for businesses and other organizations.
Try drawing a sequence diagram to:
1.Represent the details of a UML use case.
2.Model the logic of a sophisticated procedure, function, or operation.
3.See how objects and components interact with each other to complete a process.
4.Plan and understand the detailed functionality of an existing or future scenario.
4.2.1. Sequence Diagram for Colloborative Filtering(Fig 4)
26
4.2.2. Sequence Diagram for Content-Based(Fig 5)
It describes the flow of control of the target system, such as the exploring
complex business rules and operations, describing the use case also the business
process. In the Unified Modeling Language, activity diagrams are intended to model
both computational and organizational processes (i.e. workflows). The initiator is
generally a function module that gets called when an activity starts.
27
4.3.1. Activity Diagram for Colloborative Filtering(Fig 6)
28
5. MODULES DIVISION
The collaborative filtering works based on the behavior of the user. The
larger the users, the more efficient the filter .
Once we have the similarity between the items, the prediction is then
computed by taking a weighted average of the target users ratings on these similar
items. The formula to calculate rating is very similar to the user based collaborative
filtering except the weights are between items instead of between users. We use the
current users rating for the item or for other items, instead of other users rating for the
current items.
It calculates the similarity between tracks based on the tags and categories
and recommend by checking the cosine similarity of given tracks.By using this we can
check the plagiarism between the songs by using similarity scores.
Digitization of music has led to easier access to different forms music across
the globe. Increasing work pressure denies the necessary time to listen and evaluate
music for a creation of a personal music library. One solution might be developing a
29
music search engine or recommendation system based on different moods. Develop a
mood classification system from lyrics as well by combining a wide range of semantic
and stylistic features extracted from textual lyrics.
6. ALGORITHM
6.1. Collaborative filtering algorithms
6.1.1. KNN
The aim of this algorithm is to learn a function that can predict if a user will
benefit from an item — meaning the user will likely listen to a song.This can be done
by using rating. There are two ways to collect user ratings: Explicit Rating and Implicit
Rating. We used K-Nearest Neighbors Algorithm.
Explicit Rating
This means we explicitly ask the user to give a rating. This represents the
most direct feedback from users to show how much they like a song.
The dictionary meaning of explicit is to state clearly and in detail. Explicit feedback
data as the name suggests is an exact number given by a user to a product. Some of the
examples of explicit feedback are ratings of movies by users on Netflix, ratings of
products by users on Amazon
Implicit Rating
We examine whether or not a user listened to a song, for how long or
how many times, which may suggest that he/she liked that particular song.
Implicit ratings include measures of interest such as whether the user listen. A song
and, if so, how much time the user spent reading it. The main motivation for
using implicit ratings is that it removes the cost to the evaluator of examining
and rating the item.
30
Interaction matrices are based on the many entries that include a user-song pair
as well as a value that represents the user’s rating for that song.
We are going to use listen_count, the number of times a user listened to a song as
an implicit rating.
Doing some exploratory analysis, we can discover that a user listens to a mean
number of 26 songs and a median of songs of 16.
We can quickly see that not all users listen to all songs. So a lot of values in
the song x users matrix are going to be zero. Thus, we’ll be dealing with extremely
sparse data.
We now did work with a scipy-sparse matrix to avoid overflow and wasted
memory. For that purpose, we'll use the csr_matrix function from scipy.sparse.
First, we reshaped the data based on unique values from song_id as index
and user_id as columns to form axes of the resulting DataFrame..
Then, we’ll use the function pivot to produce a pivot table. Then, we’ll convert
this table to a sparse matrix.As we can observe many of the values are equal to zero.
This indicates that the user has not listened to that song.
31
KNN is a machine learning algorithm to find clusters of similar users based
on common book ratings, and make predictions using the average rating of top-k
nearest neighbors.
This method will make a prediction based on the entire data set. When we
want to predict a new value, the algorithm will look for the K instances of the set
closest to it. Then, it will use the output values of the closest K neighbours to compute
the value of the variable that need to predicted.
Then the optimal K is used to make the prediction: in the case of a regression,
the next step is to calculate the mean (or median) of the output values of the selected K
neighbours.
32
K Nearest Neighbor considers K Nearest Neighbors (Data points)
to predict the class or continuous value for a new Datapointas the name
suggests
1) Instance-based learning uses full training instances to predict output
for unknown data, rather than learning weights from training data to
predict output (as in model-based algorithms).
2) Lazy Learning: The model is not learned using training data before
the prediction is required on the new instance, and the learning
process is postponed until the prediction is asked.
3) Non-Parametric: In KNN, the mapping function has no specified
form.
Sparse matrix in general are collections in which vast majority of values are
some default values usually none or 0.CSR stands for “Compressed Sparse Row”.The
CSR notation output the row-column tuple where the matrix contains non-zero values
along with those values. It gives the information of the listen count and its location
other than 0 in music recommendation system.
33
CSR Matrix Table(Table 2)
6.1.3. FuzzyWuzzy
34
Recommendations done using content-based recommenders can be seen as a
user-specific classification problem. This classifier learns the user’s likes and dislikes
from the features of the song.
The higher the TF-Idf score the rarer the term is in a given document and vice versa.
We create a lyric_matrix variable where we store the matrix containing each word and
its TF-IDF score with regard to each song lyric.
We want to calculate the cosine similarity of each item with every other item
in the dataset. So we just pass the lyrics_matrix as argument.Once we get the
similarities, we’ll store in a dictionary called similarities, the names of the 50 most
similar songs for each song in our dataset.
35
Given two vectors of attributes, A and B, the cosine similarity, cos( product and
magnitude).
Basically, the cosine similarity is the dot product of two vectors divided by the
product of the magnitude of each vector. We divide the dot product by the magnitude
because we are measuring only angle difference. On the other hand, dot product is
taking the angle difference and magnitude into account. If we divide the dot product by
the product of each vectors magnitude we normalize our data and only measure the
angle difference. Dot product is a better measure of similarity if we can ignore
magnitude.
36
mood classification system from lyrics as well by combining a wide range of semantic
and stylistic features extracted from textual lyrics.
6.3.2. Fit_transform()
fit_transform() is used on the training data so that we can scale the
training data and also learn the scaling parameters of that data. Here, the model built
by us will learn the mean and variance of the features of the training set. These learned
parameters are then used to scale our test data.
6.3.4. Model
A contiguous sequence of N items from a given sample of text or speech”.
Here an item can be a character, a word or a sentence and N can be any integer. When
N is 2, we call the sequence a bigram. Similarly, a sequence of 3 items is called a
trigram, and soon.
37
6.3.5. SVM
“Support Vector Machine” (SVM) is a supervised machine learning
algorithm which can be used for both classification or regression challenges.
However, it is mostly used in classification problems. In the SVM algorithm, we plot
each data item as a point in n-dimensional space (where n is number of features you
have) with the value of each feature being the value of a particular coordinate.
Support Vector Machine are widely used for music classifification and gives
good results but the features used are often restricted to MFCC and some rhythmic
features. This method uses several successive SVMs to improve the results.
The basic principle is to separate two groups of data while maximising the
margin around the border (so the distance between two classes). It is based on the idea
that almost everything becomes linearly separable when represented in high-
dimensional spaces. So the two steps are: transforming the input into a suitable high-
dimensional space, and then finding the hyperplane that separate data while
maximising margins.
38
7. INPUT AND OUTPUT
Input Data: Million Songs Dataset
Collaborative
https://media.githubusercontent.com/media/sheemachinnu/music-recommendation-
system/main/10000_part1.txt
https://github.com/sheemachinnu/music-recommendation-
system/blob/main/10000_part2.txt
Listen count is the number of the times the user listened to the song,title
belongs to the name of the song which is given in the output,release is the album with
which the song was artist name is the name of the singer and year is in which year it is
published are related to the each song.
Each song will be assigned with a unique song id and also each user will be assigned
with unique user id.
39
Content
https://raw.githubusercontent.com/sheemachinnu/music-recommendation-
system/main/song_data.csv
Here the artist is the name of the singer,song is the title of the song,and text is the
lyrics of the song.
40
8. CODE IMPLEMENTATION
part1 = open("./music-recommendation-system/10000_part1.txt").readlines()
part2 = open("./music-recommendation-system/10000_part2.txt").readlines()
f = open('./music-recommendation-system/10000.txt', 'w')
f.write(''.join(total))
f.close()
# KNN Recommender
from sklearn.neighbors import NearestNeighbors
from fuzzywuzzy import fuzz
import numpy as np
class Recommender:
def __init__(self, metric, algorithm, k, data, decode_id_song):
self.metric = metric
self.algorithm = algorithm
self.k = k
self.data = data
self.decode_id_song = decode_id_song
self.data = data
self.model = self._recommender().fit(data)
recommended=self._recommend(new_song=new_song,n_recommendations=n_recom
mendations)
print("... Done")
return recommended
def _recommender(self):
Returrn NearestNeighbors(metric=self.metric,algorithm=self.algorithm,
n_neighbors=self.k, n_jobs=-1)
41
n_recommendations=n_recommendations)
# return the name of the song using a mapping dictionary
recommendations_map = self._map_indeces_to_song_title(recommendation_ids)
# Translate this recommendations into the ranking of song titles recommended
for i, (idx, dist) in enumerate(recommendation_ids):
recommendations.append(recommendations_map[idx])
return recommendations
# sort
match_tuple = sorted(match_tuple, key=lambda x: x[2])[::-1]
if not match_tuple:
print(f"The recommendation system could not find a match for {song}")
return
return match_tuple[0][1]
42
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Read userid-songid-listen_count
song_info=pd.read_csv('./music-recommendation-
system/10000.txt',sep='\t',header=None)
song_info.columns = ['user_id', 'song_id', 'listen_count']
#Merge the two dataframes above to create input dataframe for recommender systems
songs = pd.merge(left=song_info, right=song_actual, on="song_id", how="left")
songs.head()
songs.to_csv('songs.csv', index=False)
df_songs = pd.read_csv('songs.csv')
df_songs.shape
df_songs.head()
df_songs.isnull().sum()
df_songs.dtypes
#Unique songs
unique_songs = df_songs['title'].unique().shape[0]
print(f"There are {unique_songs} unique songs in the dataset")
#Unique artists
unique_artists = df_songs['artist_name'].unique().shape[0]
43
print(f"There are {unique_artists} unique artists in the dataset")
#Unique users
unique_users = df_songs['user_id'].unique().shape[0]
print(f"There are {unique_users} unique users in the dataset")
#count how many rows we have by song, we show only the ten more popular songs
ten_pop_songs =
df_songs.groupby('title')['listen_count'].count().reset_index().sort_values(['listen_count
', 'title'], ascending = [0,1])
ten_pop_songs['percentage'] =
round(ten_pop_songs['listen_count'].div(ten_pop_songs['listen_count'].sum())*100, 2)
ten_pop_songs = ten_pop_songs[:10]
ten_pop_songs
labels = ten_pop_songs['title'].tolist()
counts = ten_pop_songs['listen_count'].tolist()
plt.figure()
sns.barplot(x=counts, y=labels, palette='Set3')
sns.despine(left=True, bottom=True)
#count how many rows we have by artist name, we show only the ten more popular
artist
ten_pop_artists =
df_songs.groupby(['artist_name'])['listen_count'].count().reset_index().sort_values(['lis
ten_count', 'artist_name'], ascending = [0,1])
ten_pop_artists = ten_pop_artists[:10]
ten_pop_artists
plt.figure()
labels = ten_pop_artists['artist_name'].tolist()
counts = ten_pop_artists['listen_count'].tolist()
sns.barplot(x=counts, y=labels, palette='Set2')
sns.despine(left=True, bottom=True)
listen_counts=pd.DataFrame(df_songs.groupby('listen_count').size(),
columns=['count'])
listen_counts.head()
print(f"The maximum time the same user listened to the same songs was:
44
{listen_counts.reset_index(drop=False)['listen_count'].iloc[-1]}")
plt.figure(figsize=(20, 5))
sns.boxplot(x='listen_count', data=df_songs)
sns.despine()
# What are the most frequent number of times a user listen to the same song?
listen_counts_temp=listen_counts[listen_counts['count'] > 50].reset_index(drop=False)
plt.figure(figsize=(16, 8))
sns.barplot(x='listen_count', y='count', palette='Set3', data=listen_counts_temp)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.show();
plt.figure(figsize=(16, 8))
sns.distplot(song_user.values, color='orange')
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.show();
# Get how many values should it be if all songs have been listen by all users
values_matrix = unique_users * unique_songs
# Substract the total values with the actural shape of the DataFrame songs
zero_values_matrix = values_matrix - df_songs.shape[0]
print(f"The matrix of users x songs has {zero_values_matrix} values that are zero")
# Filtered the dataset to keep only those users with more than 16 listened
45
df_song_id_more_ten =
df_songs[df_songs['user_id'].isin(song_ten_id)].reset_index(drop=True)
test = pd.DataFrame(mat_songs_features)
test.head()
#mat_songs_features
df_unique_songs =
df_songs.drop_duplicates(subset=['song_id']).reset_index(drop=True)[['song_id',
'title']]
decode_id_song = {
song: i for i, song in
enumerate(list(df_unique_songs.set_index('song_id').loc[df_songs_features.index].title)
)
}
new_recommendations = model.make_recommendation(new_song=song,
n_recommendations=10)
print(f"The recommendations for {song} are:")
print(f"{new_recommendations}")
46
8.1.1. Graphs
we'll count how many times each song appears.The listen_count represents how many
times one user listen to the same song.The graph is for top ten popular songs which are
sorted based on the listen_count.
we'll count how many times each artist appears. Again, we'll count how many timesn
the same artist appears.
47
8.1.1.3. Listen Count by User(Fig 10)
A box plot or boxplot is a method for graphically depicting groups of numerical data
through their quartiles.
It tells about other information based on the listen_count:
1. What was the maximum time the same user listen to a same song?
2. How many times on average the same user listen to a same song?
What are the most frequent number of times a user listen to the same song?
48
How many songs does a user listen in average?
49
8.2. Content Based (Plagiarism)
import numpy as np
import pandas as pd
songs = pd.read_csv('/content/songdata.csv')
songs.head(20)
cosine_similarities = cosine_similarity(lyrics_matrix)
similarities = {}
for i in range(len(cosine_similarities)):
# Now we'll sort each element in cosine_similarities and get the indexes of the songs.
similar_indices = cosine_similarities[i].argsort()[:-50:-1]
# After that, we'll store in similarities each name of the 50 most similar songs.
# Except the first one that is the same song.
similarities[songs['song'].iloc[i]] = [(cosine_similarities[i][x], songs['song'][x],
songs['artist'][x]) for x in similar_indices][1:]
class Recommender:
def __init__(self, matrix):
self.matrix_similar = matrix
50
# print each item
self._print_message(song=song, recom_song=recom_song)
recommedations = Recommender(similarities)
print(songs['song'])
#song_idx = 2008
#inp = song_idx
sname = "As Good As New"
recommendation = {
"song": sname,
"number_songs": 4
}
recommedations.recommend(recommendation)
recommendation2 = {
"song": songs['song'].iloc[2456],
"number_songs": 4
}
recommedations.recommend(recommendation2)
51
8.3. Mood prediction:
df=pd.read_csv("/content/music-recommendation
system/datasets/lyrics_emotion_dataset/training.csv")
df_new=pd.read_csv("/content/music-recommendation-
system/datasets/lyrics_emotion_dataset/testing.csv")
train_x = df['text_final']
valid_x = df_new['text_final']
train_y = df['mood']
valid_y = df_new['mood']
Encoder = LabelEncoder()
train_y = Encoder.fit_transform(train_y.ravel())
valid_y = Encoder.fit_transform(valid_y.ravel())
classLabels = {}
52
encoderClasses = Encoder.classes_
for i in range(len(encoderClasses)):
classLabels[i] = encoderClasses[i]
classLabels
all_texts = []
for items in train_x:
all_texts.append(items)
for items in valid_x:
all_texts.append(items)
print(all_texts[0])
TF-IDF Vectorizer
porter_stemmer = nltk.stem.porter.PorterStemmer()
def porter_tokenizer(text, stemmer=porter_stemmer):
lower_txt = text.lower()
tokens = nltk.wordpunct_tokenize(lower_txt)
stems = [porter_stemmer.stem(t) for t in tokens]
no_punct = [s for s in stems if re.match('^[a-zA-Z]+$', s) is not None]
return no_punct
tfidf_vect.fit(all_texts)
xtrain_tfidf = tfidf_vect.transform(train_x)
xvalid_tfidf = tfidf_vect.transform(valid_x)
Model
from sklearn.metrics import f1_score
claf=svm.LinearSVC(class_weight="balanced")
claf.fit(xtrain_tfidf,train_y)
p = claf.predict(xvalid_tfidf)
def predictEmotion(lyrics):
53
wt=word_tokenize(lyrics)
tag_map = defaultdict(lambda : wn.NOUN)
tag_map['J'] = wn.ADJ
tag_map['V'] = wn.VERB
tag_map['R'] = wn.ADV
Final_words = []
word_Lemmatized = WordNetLemmatizer()
for word, tag in pos_tag(wt):
if word not in stopwords.words('english') and word.isalpha():
word_Final = word_Lemmatized.lemmatize(word,tag_map[tag[0]])
Final_words.append(word_Final)
result = str(Final_words)
df9=pd.DataFrame(columns=["lyrics"])
df9=df9.append({'lyrics':result},ignore_index=True)
testx=df9['lyrics']
xvalid_tfidf = tfidf_vect.transform(testx)
y=claf.predict(xvalid_tfidf)
return classLabels[y[0]]
recommendation = {
"song": songs['song'].iloc[song_idx],
"number_songs": -1
}
recommnded_songs = recommedations.recommend(recommendation)
emotionOfSelectedSong = predictEmotion(songs['text'].iloc[song_idx])
EmotionOfSelectedSong
54
reccomendedSongsByEmotion
n = len(reccomendedSongsByEmotion)
55
Output for Mood Prediction 2(Fig 14)
The output reprrsents the song titles along with the similarity score which are based on
the mood of the input song given.
56
9. FUNCTIONAL REQUIREMENTS
A Functional Requirement (FR) is a description of the service that the software must
offer. It describes a software system or its component. ... It can be a calculation, data
manipulation, business process, user interaction, or any other
specific functionality which defines what function a system is likely to perform.
1. The system can add, read datasets.
2.The system performs the classifier training process and display the music
recommended from the training data
3.The system can display a set of musical records from the content, collaborative
filtering.
4.The system displays the musical records based on emotion of the user and displays
the personalized music.
5.The system can detect and displays the plagiarism in the musical records.
1.The system can run in various web browser which support the system environment.
2.The system gives a fast response.
3.The system has a user-friendly interface design.
Availability: The application is available to all the intended users, all the time
based on the Network Availability.
Implementation: This System can be easily implemented and has scope for
making future changes easily, since the system is developed by using the feature of
Modularity.
57
Hardware Requirements:
1.RAM: 4GB
2.Storage: 500GB
3.CPU: 2 GHz or faster
4.Architecture: 32-bit or64-bit
Software requirements:
1.Python in Jupyter Notebook is used for data pre-processing, model training and
prediction.
2.Operating System: windows 7 and above or Linux based OS or MACOS
11.DATASET
Link:
Content:
https://raw.githubusercontent.com/sheemachinnu/music-recommendation-
system/main/song_data.csv
Collaborative:https://media.githubusercontent.com/media/sheemachinnu/music-
recommendation-system/main/10000_part1.txt
https://media.githubusercontent.com/media/sheemachinnu/music-recommendation-
system/main/10000_part1.txt
58
12. CONCLUSION
The following are our conclusions based on experiment results. First, music
recommender system should consider the music genre information to increase the
quality of music recommendations. The music recommender is able to recommend the
songs based on the song features. The music Recommender is able to check plagiarism
in the dataset taken by generating the similarity score for each recommended song.
The mood of the song is predicted by examining the lyrics of the given song with all
the other songs in the dataset and predicting the mood and similarity scores and
recommending the songs based on the mood.The complex nature of the machine
learning systems like the Music Recommendation System cant have a standardized
structure because different music recommender systems work in different way. Based
on our analyses, we can suggest for future research to add other music features in order
to improve the accuracy of the recommender system, such as using tempo gram for
capturing local tempo at a certain time.
59
14. REFERENCES
1. D. Arditi, Digital Subscriptions: The Unending Consumption of Music in the Digital
Era, Popular Music and Society (2017).
4.Everyone listens to music, but how we listen is changing. [online] Available at:
http://www.nielsen.com/us/en/insights/news/2015/everyone-listens-to-music-but-how-
we-listen-is-changing.html [Accessed 10 Oct. 2017].
8. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International
Conference on World Wide Web (WWW '17). International World Wide Web
Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173-
182. DOI: https://doi.org/10.1145/3038912.3052569.
60
10.Min SH., Han I. (2005) Recommender Systems Using Support Vector Machines. In:
Lowe D., Gaedke M. (eds) Web Engineering. ICWE 2005. Lecture Notes in Computer
Science, vol 3579. Springer, Berlin, Heidelberg.
Submitted by:
Mr.P.Krishnanjeneyulum M.Tech,(Ph.D)
Assistant Professor
Computer Science and Engineering
ANITS
61
REFERENCE PAPER
62
A Survey of Music Recommendation Systems
and Future Perspectives
1 Introduction
With the explosion of network in the past decades, internet has become the major
source of retrieving multimedia information such as video, books, and music etc.
People has considered that music is an important aspect of their lives and they
listen to music, an activity they engaged in frequently. Previous research has also
indicated that participants listened to music more often than any of the other
activities [57] (i.e. watching television, reading books, and watching movies).
Music, as a powerful communication and self-expression approach, therefore,
has appealed a wealth of research.
s
Yading is supported by the China Scholarship Council. We would like to thank
Geraint A. Wiggins for his advices.
9th International Symposium on Computer Music Modelling and Retrieval (CMMR 2012)
19-22 June 2012, Queen Mary University of London
All rights remain with the authors.
395
2 Yading Song, Simon Dixon and Marcus Pearce
However, the problem now is to organise and manage the million of music
titles produced by society [51]. MIR techniques have been developed to solve
problems such as genre classification [42, 75], artist identification [46], and in-
strument recognition [49]. Since 2005, an annual evaluation event called Music
Information Retrieval Evaluation eXchange (MIREX1) is held to facilitate the
development of MIR algorithms.
Additionally, music recommender is to help users filter and discover songs
according to their tastes. A good music recommender system should be able to
automatically detect preferences and generate playlists accordingly. Meanwhile,
the development of recommender systems provides a great opportunity for in-
dustry to aggregate the users who are interested in music. More importantly,
it raises challenges for us to better understand and model users’ preferences in
music [76].
Currently, based on users’ listening behaviour and historical ratings, collab-
orative filtering algorithm has been found to perform well [9]. Combined with
the use of content-based model, the user can get a list of similar songs by low-
level acoustic features such as rhythm, pitch or high-level features like genre,
instrument etc [7].
Some music discovery websites such as Last.fm 2, Allmusic3, Pandora4 and
Shazam 5 have successfully used these two approaches into reality. At the mean-
time, these websites provide an unique platform to retrieve rich and useful in-
formation for user studies.
Music is subjective and universal. It not only can convey emotion, but also
can it modulate a listener’s mood [23]. The tastes in music are varied from person
to person, therefore, the previous approaches cannot always meet the users’
needs. An emotion-based model and a context-based model have been proposed
[18, 34]. The former one recommends music based on mood which allows the user
to locate their expected perceived emotion on a 2D valence-arousal interface [22].
The latter one collects other contextual information such as comments, music
review, or social tags to generate the playlist. Though hybrid music recommender
systems would outperform the conventional models, the development is still at
very early stage [88]. Due to recent studies in psychology, signal processing,
machine learning and musicology, there is much room for future extension.
This paper, therefore, surveys a general music recommender framework from
user profiling, item modelling, and item-user profile matching to a series of state-
of-art approaches. Section 2 gives a brief introduction of components in music
recommendation systems and in section 3, the state-of-art recommendation tech-
niques are explained. To the end of this paper, we conclude and propose a new
model based on users’ motivation.
1 http://www.music-ir.org/mirex/wiki/MIREX HOME
2 http://www.last.fm/
3 http://www.allmusic.com/
4 http://www.pandora.com
5 http://www.shazam.com/
396
A Survey of Music Recommendation Systems 3
First Step - User Profile Modelling Celma [14] suggested that the user
profile can be categorised into three domains: demographic, geographic, and psy-
chographic (shown in Table 1). Based on the steadiness, psychological data has
been further divided into stable attributes which are essential in making a long
term prediction and fluid attributes which can change on an hour to hour basis
[24].
397
4 Yading Song, Simon Dixon and Marcus Pearce
398
A Survey of Music Recommendation Systems 5
399
6 Yading Song, Simon Dixon and Marcus Pearce
supplied by the creators, such as the title of the song, artist name, and lyrics to
find the target songs [20].
Limitation Though it is fast and accurate, the drawbacks are obvious. First of
all, the user has to know about the editorial information for a particular music
item. Secondly, it is also time consuming to maintain the increasing metadata.
Moreover, the recommendation results is relatively poor, since it can only rec-
ommend music based on editorial metadata and none of the users’ information
has been considered.
To recommend items via the choice of other similar users, collaborative filtering
technique has been proposed [28]. As one of the most successful approaches in
recommendation systems, it assumes that if user X and Y rate n items similarly
or have similar behaviour, they will rate or act on other items similarly [59].
Instead of calculating the similarity between items, a set of ‘nearest neigh-
bour’ users for each user whose past ratings have the strongest correlation are
found. Therefore, scores for the unseen items are predicted based on a com-
bination of the scores known from the nearest neighbours [65]. Collaborative
filtering is further divided into three subcategories: memory-based, model-based,
and hybrid collaborative filtering [63, 68].
400
A Survey of Music Recommendation Systems 7
– Popularity bias Generally, popular music can get more ratings. The music
in long tail, however, can rarely get any. As a result, collaborative filtering
mainly recommend the popular music to the listeners. Though giving pop-
ular items are reliable, it is still risky, since the user rarely get pleasantly
surprised.
– Cold start It is also known as data sparsity problems. At an early stage,
few ratings is provided. Due to the lack of these ratings, prediction results
are poor.
– Human effort A perfect recommender system should not involve too much
human efforts, since the users are not always willing to rate. The ratings
may also grow towards those who do rate, but it may not be representative.
Because of this absence of even distributed ratings, it can either give us false
negative or false positive results.
Query by Humming (QBSH) Humming and singing are the natural way to
express the songs [31]. In the early 1990s, based on content-based model, query
by humming system was proposed [25, 80]. Early query by humming systems
were using melodic contour which had been seen as the most discriminative
features in songs.
It follows three steps: construction of the songs database, transcription of
the users’ melodic information query and pattern matching algorithms which
are used to get the closest results from collections [1]. In the past few years,
except melody, a better performance has also been achieved by embedding with
lyrics and enhancing the main voice [19, 77].
401
8 Yading Song, Simon Dixon and Marcus Pearce
Limitations
402
A Survey of Music Recommendation Systems 9
Rather than using acoustic features in content-based model and ratings in col-
laborative filtering, context-based information retrieval model uses the public
opinion to discover and recommend music [18]. Along with the development of
social networks such as Facebook 10, Youtube11, and Twitter 12, these websites
provide us rich human knowledge such as comments, music review, tags and
friendship networks [36].
Context-based information retrieval, therefore, uses web/document mining
techniques to filter out important information to support problems like artist
similarity, genre classification, emotion detection [82], semantic space [39, 40]
etc. Some researchers have suggested that the use of social information has out-
performed content-based model [70, 81].
However, the same problems as collaborative filtering, the popular music can
always get more public opinions than those in long tail [21]. Eventually, rich
music gets richer feedback, it again results in a popularity bias problem.
Hybrid model aims at combining two or more models to increase the overall per-
formance. Burke [9] pointed out several methods to build a hybrid model such as
weighted, switching, mixed, feature combination, and cascade. There is no doubt
that a proper hybrid model would outperform a single approach, since it can
incorporate the advantages of both methods while inheriting the disadvantages
of neither [65, 87, 88].
403
10 Yading Song, Simon Dixon and Marcus Pearce
Playlist Generation Another issue is the sequence of the playlist [38]. Most
of the recommender systems are not flexible, because the playlist is ordered by
the similarity distance between seed songs. Though the most similar songs are
given in order, the theme and mood can be dramatically changed in between.
This may result in the dissatisfaction and discontinuation of the songs.
Research indicates that a playlist should have a main theme (mood, event,
activity) evolve with time [17]. Rather than randomly shuffling, human skipping
behaviour can be considered for dynamic playlist generation [15, 54]. For exam-
ple, assuming that the users dislike the song when they skipped it, the system
therefore, removes the songs which are similar to the song which they skipped
[55, 56, 78].
User Interface Design A bad design of user interface cannot affect the accu-
racy of the system, it does influence the ratings and listening experience. A clear
design always gives the user a better understanding of the system. Moreover,
an overall control of the system and less human efforts required for operation
should be considered during designing.
404
A Survey of Music Recommendation Systems 11
As we can see from the development of music recommenders over the past
years, the given results tend to be more personalised and subjective. Only consid-
ering the music itself and human ratings are no longer sufficient. A great amount
of work in recent years have been done in music perception, psychology, neuro-
science and sport which study the relationship between music and the impact
of human behaviour. David Huron also mentioned music has sex and drug-like
qualities. Undoubtably, music always has been an important component of our
life, and now we have greater access to it.
Researches in psychology pointed out that music not only improves mood,
increases activation, visual and auditory imagery, but also recalls of associated
films or music videos and relieves stress [33]. Moreover, the empirical exper-
iments in sport mentioned that the main benefits for listening to the music
which include work output extension, performance enhancement, and dissoci-
ation from unpleasant feelings etc [71]. For example, athletes prefer uptempo,
conventional, intense, rebellious, energetic, and rhythmic music rather than re-
flective and complex music [66]. An important fact found by psychologists is that
users’ preference in music is linked to their personality. Also worth mentioning
that fast, upbeat music produces a stimulative effect whereas slow, while soft
music produces a sedative effects [12]. All of these highlight that music recom-
mender is not only a tool for relaxing, but also acts as an effective tool to meet
our needs under different contexts. To our knowledge, there is few research based
on these empirical results.
Designing a personalised music recommender is complicated, and it is chal-
lenging to thoroughly understand the users’ needs and meet their requirements.
As discussed above, the future research direction will be mainly focused on user-
centric music recommender systems. A survey among athletes showed practition-
ers in sport and exercise environments tend to select music in a rather arbitrary
manner without full consideration of its motivational characteristics. Therefore,
future music recommender should be able to lead the users reasonably choose
music. To the end, we are hoping that through this study we can build the bridge
among isolated research in all the other disciplines.
References
1. Ricardo A. Baeza-Yates and Chris H. Perleberg. Fast and Practical Approximate
String Matching. In Combinatorial Pattern Matching, Third Annual Symposium,
pages 185–192, 1992.
2. G. Adomavicius and A. Tuzhilin. Toward the Next Generation of Recommender
Systems: A Survey of the State-of-the-art and Possible Extensions. IEEE Trans-
actions on Knowledge and Data Engineering, 17(6):734–749, June 2005.
3. C Anderson. The Long Tail. Why the Future of Business is selling less of more.
Hyperion Verlag, 2006.
405
12 Yading Song, Simon Dixon and Marcus Pearce
406
A Survey of Music Recommendation Systems 13
21. Douglas Eck, Paul Lamere, T. Bertin-Mahieux, and Stephen Green. Automatic
Generation of Social Tags for Music Recommendation. Advances in neural infor-
mation processing systems, 20:385–392, 2007.
22. T. Eerola and J. K. Vuoskoski. A Comparison of the Discrete and Dimensional
Models of Emotion in Music. Psychology of Music, 39(1):18–49, August 2010.
23. Yazhong Feng and Y Zhuang. Popular Music Retrieval by Detecting Mood. In
International Society for Music Information Retrieval 2003, volume 2, pages 375–
376, 2003.
24. Alan Page Fiske, Shinobu Kitayama, Hazel Rose Markus, and R E Nisbett. The
Cultural Matrix of Social Psychology. 1998.
25. Asif Ghias, Jonathan Logan, David Chamberlin, and Brian C. Smith. Query by
Humming. Proceedings of the third ACM international conference on Multimedia
- MULTIMEDIA ’95, pages 231–236, 1995.
26. S Gosling. A Very Brief Measure of the Big-Five Personality Domains. Journal of
Research in Personality, 37(6):504–528, December 2003.
27. Jonathan L. Herlocker, Joseph a. Konstan, Loren G. Terveen, and John T. Riedl.
Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on
Information Systems, 22(1):5–53, January 2004.
28. Will Hill, Larry Stead, Mark Rosenstein, George Furnas, and South Street. Rec-
ommending and Evaluating Choices in a Vitual Community of Use. Mosaic A
Journal For The Interdisciplinary Study Of Literature, pages 5–12, 1995.
29. X Hu and J. Stephen Downie. Exploring Mood Metadata: Relationships with
Genre, Artist and Usage Metadata. In 8th International Conference on Music
Information Retrieval, 2007.
30. Yajie Hu and Mitsunori Ogihara. Nextone Player: A Music Recommendation Sys-
tem Based on User Behavior. In 12th International Society for Music Information
Retrieval Conferenceh, number Ismir, pages 103–108, 2011.
31. Jyh-Shing Roger Jang and Hong-Ru Lee. A General Framework of Progressive
Filtering and Its Application to Query by Singing/Humming. IEEE Transactions
on Audio, Speech, and Language Processing, 16(2):350–358, February 2008.
32. D Jennings. Net, Blogs and Rock ’n’ Rolls: How Digital Discovery Works and What
It Means for Consumers. 2007.
33. Patrik N Juslin and Daniel Vastfjäll. Emotional Responses to Music: the Need to
Consider Underlying Mechanisms. The Behavioral and brain sciences, 31(5):559–
621, October 2008.
34. Y.E. Kim, E.M. Schmidt, Raymond Migneco, B.G. Morton, Patrick Richardson,
Jeffrey Scott, J.A. Speck, and Douglas Turnbull. Music Emotion Recognition: A
State of the Art Review. In Proc. of the 11th Intl. Society for Music Information
Retrieval (ISMIR) Conf, number Ismir, pages 255–266, 2010.
35. Fang-Fei Kuo, Meng-Fen Chiang, Man-Kwan Shan, and Suh-Yin Lee. Emotion-
based Music Recommendation by Association Discovery from Film Music. In Pro-
ceedings of the 13th annual ACM international conference on Multimedia - MUL-
TIMEDIA ’05, page 507, New York, New York, USA, 2005. ACM Press.
36. Paul Lamere. Social Tagging and Music Information Retrieval. Journal of New
Music Research, 37(2):101–114, June 2008.
37. Edith L. M. Law, Luis Von Ahn, Roger B. Dannenberg, and Mike Crawford.
Tagatune: A Game for Music and Sound Annotation. In 8th International Con-
ference on Music Information Retrieval, 2007.
38. Jin Ha Lee, Bobby Bare, and Gary Meek. How Similar is too Similar? Exploring
Users’ Perception of Similarity in Playlist Evaluation. In International Conference
on Music Information Retrieval 2011, number ISMIR, pages 109–114, 2011.
407
14 Yading Song, Simon Dixon and Marcus Pearce
39. Mark Levy. A Semantic Space for Music Derived from Social Tags. Austrian
Compuer Society, 1:12, 2007.
40. Mark Levy and Mark Sandler. Music Information Retrieval Using Social Tags and
Audio. IEEE Transactions on Multimedia, 11(3):383–395, 2009.
41. Qing Li, Byeong Man Kim, Dong Hai Guan, and Duk Oh. A Music Recommender
Based on Audio Features. In Proceedings of the 27th annual international ACM
SIGIR conference on Research and development in information retrieval, pages
532–533, Sheffield, United Kingdom, 2004. ACM.
42. Bass Lines, Emiru Tsunoo, George Tzanetakis, and Nobutaka Ono. Beyond Tim-
bral Statistics : Improving Music Classification Using Percussive. IEEE Transac-
tions on Audio, Speech and Language Processing, 19(4):1003–1014, 2011.
43. Beth Logan. Music Recommendation from Song Sets. In International Conference
on Music Information Retrieval 2004, number October, pages 10–14, Barcelona,
Spain, 2004.
44. Terence Magno and Carl Sable. A Comparison of Signal of Signal-Based Music
Recommendation to Genre Labels, Collaborative Filtering, Musicological Analysis,
Human Recommendation and Random Baseline. In ISMIR 2008: proceedings of
the 9th International Conference of Music Information Retrieval, pages 161–166,
2008.
45. Chun-man Mak, Tan Lee, Suman Senapati, Yu-ting Yeung, and Wang-kong Lam.
Similarity Measures for Chinese Pop Music Based on Low-level Audio Signal At-
tributes. In 11th International Society for Music Information Retrieval Conference,
number ISMIR, pages 513–518, 2010.
46. M Mandel. Song-level Features and Support Vector Machines for Music Classifi-
cation. In Proc. International Conference on Music, 2005.
47. MI Mandel. A Web-based Game for Collecting Music Metadata. In In 8th Inter-
national Conference on Music Information Retrieval (ISMIR), 2008.
48. M Mann, TJ Cox, and FF Li. Music Mood Classification of Television Theme
Tunes. In 12th International Society for Music Information Retrieval Conference,
number Ismir, pages 735–740, 2011.
49. Janet Marques and Pedro J Moreno. A Study of Musical Instrument Classification
Using Gaussian Mixture Models and Support Vector Machines, 1999.
50. M. Ogihara, Bo Shao, Dingding Wang, and Tao Li. Music Recommendation Based
on Acoustic Features and User Access Patterns. IEEE Transactions on Audio,
Speech, and Language Processing, 17(8):1602–1611, November 2009.
51. F. Pachet and J.J. Aucouturier. Improving Timbre Similarity: How High is the
Sky? Journal of negative results in speech and audio sciences, 1(1):1–13, 2004.
52. François Pachet and Daniel Cazaly. A Taxonomy of Musical Genres. In Content-
Based Multimedia Information Retrieval Access Conference (RIAO), number April,
2000.
53. Francois Pachet. Knowledge Management and Musical Metadata. In Encyclopedia
of Knowledge Management. 2005.
54. Elias Pampalk, Tim Pohle, and Gerhard Widmer. Dynamic Playlist Generation
Based on Skipping Behavior. In Proc. of the 6th ISMIR Conference, volume 2,
pages 634–637, 2005.
55. Steffen Pauws, Berry Eggen, and Miles Davis. PATS : Realization and User Evalu-
ation of an Automatic Playlist Generator PATS : Realization and User Evaluation
of an Automatic Playlist Generator. In 3rd International Conference on Music
Information Retrieval, 2002.
56. CBE Plaza. Uncovering Affinity of Artist to Multiple Genres from Social Behavior
Data. In ISMIR 2008: proceedings of the 9th, pages 275–280, 2008.
408
A Survey of Music Recommendation Systems 15
57. Peter J. Rentfrow and Samuel D. Gosling. The Do Re Mi’s of Everyday Life: The
structure and personality correlates of music preferences. Journal of Personality
and Social Psychology, 84(6):1236–1256, 2003.
58. Peter J Rentfrow and Samuel D Gosling. Message in a Ballad: the Role of Mu-
sic Preferences in Interpersonal Perception. Psychological science, 17(3):236–42,
March 2006.
59. Paul Resnick, Hal R Varian, and Guest Editors. Recommender Systems. Commu-
nications of the ACM, 40(3):56–58, 1997.
60. J.A. Russell. A Circumplex Model of Affect. Journal of personality and social
psychology, 39(6):1161–1178, 1980.
61. Pasi Saari, Tuomas Eerola, and Olivier Lartillot. Generalizability and Simplicity as
Criteria in Feature Selection: Application to Mood Classification in Music. Audio,
Speech, and Language Processing, IEEE Transactions on, 19(99):1–1, 2011.
62. A Salomon. A Content-based Music Similarity Function. In Cambridge Research
Labs-Tech Report, number June, 2001.
63. Badrul Sarwar, George Karypis, and Joseph Konstan. Item-based Collaborative
Filtering Recommendation Algorithms. Proceedings of the 10th, pages 285–295,
2001.
64. Bo Shao, Tao Li, and M. Ogihara. Quantify Music Artist Similarity Based on
Style and Mood. In Proceeding of the 10th ACM workshop on Web Information
and Data Management, pages 119–124. ACM, 2008.
65. Yoav Shoham and Marko Balabannovic. Content-Based, Collaborative Recom-
mendation. Communications of the ACM, 40(3):66–72, 1997.
66. Stuart D Simpson and Costas I Karageorghis. The Effects of Synchronous Music on
400-m Sprint Performance. Journal of sports sciences, 24(10):1095–102, October
2006.
67. Janto Skowronek and M McKinney. Ground Truth for Automatic Music Mood
Classification. In Proc. ISMIR, pages 4–5, 2006.
68. Xiaoyuan Su and Taghi M. Khoshgoftaar. A Survey of Collaborative Filtering
Techniques. Advances in Artificial Intelligence, 2009(Section 3):1–19, 2009.
69. Neel Sundaresan. Recommender Systems at the Long Tail. In of the fifth ACM
conference on Recommender systems, number RecSys 2011, pages 1–5, 2011.
70. Panagiotis Symeonidis, Maria Ruxanda, Alexandros Nanopoulos, and Yannis
Manolopoulos. Ternary Semantic Analysis of Social Tags for Personalized Mu-
sic Recommendation. In Proc. 9th ISMIR Conf, pages 219–224. Citeseer, 2008.
71. P.C. Terry and C.I. Karageorghis. Psychophysical Effects of Music in Sport and
Exercise: An Update on Theory, Research and Application. In Proceedings of
the 2006 Joint Conference of the Australian Psychological Society and the New
Zealand Psychological Society: Psychology Bridging the Tasman: Science, Culture
and Practice, pages 415–419. Australian Psychological Society, 2006.
72. Nava Tintarev and Judith Masthoff. Effective Explanations of Eecommendations:
User-centered Design. In Proceedings of the 2007 ACM conference on Recommender
systems, pages 153–156. ACM, 2007.
73. KTG Tsoumakas and George Kalliris. Multi-Label Classification of Music into
Emotions. In ISMIR 2008: proceedings of the 9th International Conference of
Music Information Retrieval, pages 325–330, 2008.
74. Douglas Turnbull, Luke Barrington, and Gert Lanckriet. Five Approaches to Col-
lecting Tags for Music. In ISMIR 2008: proceedings of the 9th International Con-
ference of Music Information Retrieval, pages 225–230, 2008.
409
16 Yading Song, Simon Dixon and Marcus Pearce
75. George Tzanetakis, Student Member, and Perry Cook. Musical Genre Classifi-
cation of Audio Signals. IEEE Transactions on Speech and Audio Processing,
10(5):293–302, 2002.
76. Alexandra Uitdenbogerd and van Schyndel Ron. A Review of Factors Affecting
Music Recommender. In 3rd International Conference on Music Information Re-
trieval (2002), 2002.
77. Erdem Unal, Elaine Chew, Panayiotis G. Georgiou, and Shrikanth S. Narayanan.
Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Ap-
proach. IEEE Transactions on Audio, Speech, and Language Processing, 16(2):359–
371, February 2008.
78. Rob van Gulik and Fabio Vignoli. Visual Playlist Generation on the Artist Map. In
5th International Conference on Music Infomation Retrieval, number ISMIR2005,
pages 520–523, 2005.
79. Fabio Vignoli. A Music Retrieval System Based on User-driven Similarity and
its Evaluation. In International Conference on Music Information Retrieval 2005,
2005.
80. C.C. Wang, J.S.R. Jang, and Wennen Wang. An Improved Query by
Singing/Humming System Using Melody and Lyrics Information. In 11th Inter-
national Society for Music Information Retrieval Conference, number Ismir, pages
45–50, 2010.
81. Dingding Wang, Tao Li, and Mitsunori Ogihara. Tags Better Than Audio Features?
The Effect of Joint use of Tags and Audio Content Features for Artistic Style Clu-
tering. In International Conference on Music Information Retrieval 2010, number
ISMIR, pages 57–62, 2010.
82. Ju-chiang Wang, Hung-shin Lee, Hsin-min Wang, and Shyh-kang Jeng. Learning
the Similarity of Audio Msuic in Bag-of-Frames Representation from Tagged Music
Data. In International Conference on Music Information Retrieval 2011, number
ISMIR, pages 85–90, 2011.
83. Jun Wang, A.P. De Vries, and M.J.T. Reinders. Unifying User-based and Item-
based Collaborative Filtering Approaches by Similarity Fusion Categories. In Pro-
ceedings of the 29th annual international ACM SIGIR conference on Research and
development in information retrieval, pages 501–508. ACM, 2006.
84. Xing Wang, Xiaoou Chen, Deshun Yang, and Yuqian Wu. Music Emotion Clas-
sification of Chinese Songs Based on Lyrics using TF*IDF and Rhyme. In 12th
International Society for Music Information Retrieval Conference, number Ismir,
pages 765–770, 2011.
85. Dan Yang and W.S. Lee. Disambiguating Music Emotion Using Software Agents.
In Proceedings of the 5th International Conference on Music Information Retrieval
(ISMIR04), pages 52–58, 2004.
86. Yi-Hsuan Yang. Music Emotion Recognition. Tayler and Francis Group, 2011.
87. Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hi-
roshi G Okuno. Hybrid Collaborative and Content-based Music Recommendation
Using Probabilistic Model with Latent User Preferences. In Proceedings of the 7th
International Conference on Music Information Retrieval, pages 296–301, 2006.
88. Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hi-
roshi G Okuno. Improving Efficiency and Scalability of Model-Based Music Rec-
ommender System Based on Incremental Training. In ISMIR 2007: proceedings of
the 8th International Conference of Music Information Retrieval, number ISMIR,
2007.
410
PAPER PUBLICATION
International Journal of Management, Technology And Engineering ISSN NO : 2249-7455
MUSIC RECOMMENDATION SYSTEM WITH
PLAGIARISM DETECTION
Mr. P. Krishnanjaneyulu1, Sheema Patro2, V. Jahnavi2, P. N. V. S Dhanush2, G. Mahesh2
1
Assistant Professor at Department of CSE, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam-531162, India
2
Final year students of Department of CSE, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam-531162, India
Abstract— In this paper, we present a personalized music recommendation system based on the KNN and machine learning algorithms. In
personalized music recommendation system, we propose a collaborative filtering and content filtering recommendation algorithm here we
use log file which stores the recommendations to user . The proposed system contains the log files which stores the previous history of
playlist of music by the user. Here in this paper we extracts data of the users access files from the history of logs and recommend them.
Content-based methods gives recommendations based on the similarity of two song contents or attributes while here we implement the
matrix of different songs and perform the collaborative filtering methods on different songs for recommendation system The plagiarism
system extracts the music from input and finds music that are close to the query music which the query has plagiarized. We use the million
song dataset to evaluate the personalized music recommendation system. The data cleaning is done by the data science algorithms. The
plagiarism detection is done by finding the similar music genre which minimizes the issue of copyrights.
Index Terms— collaborative filtering algorithm. KNN, cosine similarity, tf-idf, CSR Matrix
3 Proposed Work
Music recommendation systems are a two-edged sword.
They are advantageous to both the user and the provider. 1. Collaborative Filtering
2. Plagiarism and content-based module pattern is known as fuzzywuzzy. The difference between
sequences is calculated using Levenshtein distance. It
3. Mood Prediction
compares the two strings and returns a similarity index. If
Algorithm: Collaborative filtering we provide an incorrect song, it compares it to other songs,
determines the similarity using the fuzzy matching function,
calculates the ratio, and uses the song with the highest ratio
3.1 KNN as the input song.
The goal of this method is to develop a function that can
predict whether or not a user would profit from an item —
3.8 Plagiarism:
in this case, whether or not the user would listen to a music.
This can be accomplished through the use of ratings. User 3.8.1 Cosine similarity (Content-Based Model)
ratings can be collected in two ways: explicitly and
The following two phases must be completed by a content-
implicitly. The K-Nearest Neighbors Algorithm was utilized.
based recommendation system. To begin, extract features
from the song descriptions' content to generate an object
representation. Second, create a similarity function among
3.2 Explicit Rating: these object representations that resembles the item-item
This means we explicitly ask the user to give a rating. This similarity that humans recognize. Because we're dealing
represents the most direct feedback from users to show how with text and words, we may use Term Frequency-Inverse
much they like a song. Document Frequency (TF-IDF) to match them.
3.11 Fit_transform():
The fit transform() function is applied to the training data in
order to scale it and learn its scaling parameters. Here, the
model we developed will learn the mean and variance of the
training set's characteristics. Our test data is then scaled
using the parameters we've learned.
5. Performance Analysis
Conclusion
The following are our conclusions based on experiment
results. First, music recommender system should consider
the music genre information to increase the quality of music
recommendations. The music recommender is able to
Fig 4 Graph between title and listen count recommend the songs based on the song features. The music
Recommender is able to check plagiarism in the dataset
taken by generating the similarity score for each
recommended song. The mood of the song is predicted by
examining the lyrics of the given song with all the other
songs in the dataset and predicting the mood and similarity
scores and recommending the songs based on the mood. The
complex nature of the machine learning systems like the
Music Recommendation System can’t have a standardized
structure because different music recommender systems
work in different way. Based on our analyses, we can
suggest for future research to add other music features in
order to improve the accuracy of the recommender system,