Professional Documents
Culture Documents
BACHELOR OF ENGINEERING
IN
Submitted by:
___________________________
______________________________________
______________________________________
Oct 2022
Table of Contents
Introduction 3
Literature Survey 4
Proposed Methodology 5
References 12
Introduction
With the explosion of network in the past decades, internet has become the major source
of retrieving multimedia information such as video, books, and music etc. People has
considered that music is an important aspect of their lives and they listen to music, an
activity they engaged in frequently. Previous research has also indicated that participants
listened to music more often than any of the other activities (i.e., watching television,
reading books, and watching movies). Music, as a powerful communication and self-
expression approach, therefore, has appealed a wealth of research.
Rapid development of mobile devices and internet has made possible for us to access
different music resources freely. The number of songs available exceeds the listening
capacity of single individual. People sometimes feel difficult to choose from millions of
songs. Moreover, music service providers need an efficient way to manage songs and
help their costumers to discover music by giving quality recommendation. Thus, there is a
strong need of a good recommendation system.
With the rise of digital content distribution, people now have access to music collections
on an unprecedented scale. Commercial music libraries easily exceed 15 million songs,
which vastly exceeds the listening capability of any single person. With millions of songs
to choose from, people sometimes feel overwhelmed. Thus, an efficient music
recommender system is necessary in the interest of both music service providers and
customers. Users will have no more pain to make decisions on what to listen while music
companies can maintain their user group and attract new users by improving users’
satisfaction.
Currently, there are many music streaming services, like Pandora, Spotify, etc. which are
working on building high-precision commercial music recommendation systems. These
companies generate revenue by helping their customers discover relevant music and
charging them for the quality of their recommendation service. Thus, there is a strong
thriving market for good music recommendation systems. Music recommender system is a
system which learns from the users past listening history and recommends them songs
which they would probably like to hear in future.
In the academic field, the domain of user centric music recommendation has always been
ignored due to the lack of publicly available, open and transparent data. Million Song
Dataset Challenge provides data which is open and largescale which facilitates academic
research in user centric music recommender system which hasn’t been studied a lot.
Literature Survey
Existing System
Over the years, recommender systems have been studied widely and are divided into
different categories according to the approach being used. The categories are
collaborative filtering (CF), content based and context based.
Collaboration filtering
Collaborative filtering uses the numerical reviews given by the user and is mainly based
upon the historical data of the user available to the system. The historical data available
helps to build the user profile and the data available about the item is used to make the
item profile. Both the user profile and the item profile are used to make a recommendation
system. The Netflix Competition has given much popularity to collaborative filtering,
Collaborative filtering is considered the most basic and the easiest method to find
recommendations and make predictions regarding the sales of a product. It does have
some disadvantages which has led to the development of new methods and techniques.
Recommender structures are proving to be a useful device for addressing a part of the
records overload phenomenon from the internet. Its evolution has followed the evolution of
the internet. The primary technology of recommender system used conventional web sites
to gather information from the following sources:
(a) content material-primarily based records
(b) demographic statistics, and
(c) memory-primarily based information.
Proposed System
It is possible to use a cluster-based algorithm to predict the songs, however, it lacks the
flexibility to add other features to the system, such as a classification predictor.
In other words, a clustered-based algorithm is one type of recommendation system.
However, compared to the two other types of RS introduced below, a cluster-based
algorithm lacks flexibility. In fact, both content-based filtering and collaborative filtering can
include the clustering outcome into the models, creating a hybrid Recommendation
System.
In the context of Spotify playlists, we use the features (loudness, tempo, etc.) of each
song in a playlist to find the average score of the whole playlist. Then, we recommend a
song that has a score similar to the playlist but is not in the playlist.
Proposed Methodology
Waterfall Model
The waterfall model is a breakdown of project activities into linear sequential phases,
where each phase depends on the deliverables of the previous one and corresponds to a
specialization of tasks. The approach is typical for certain areas of engineering design.
In software development, it tends to be among the less iterative and flexible approaches,
as progress flows in largely one direction ("downwards" like a waterfall) through the
phases of conception, initiation, analysis, design, construction, testing, deployment and
maintenance.
The waterfall model was selected as the SDLC model due to the following reasons:
Machine Learning
A machine learning model is the output of the training process and is defined as the
mathematical representation of the real-world process. The machine learning algorithms
find the patterns in the training dataset, which is used to approximate the target function
and is responsible for mapping the inputs to the outputs from the available dataset. These
machine learning methods depend upon the type of task and are classified as
Classification models, Regression models, Clustering, Dimensionality. Reductions,
Principal Component Analysis, etc. Machine learning is no exception, and a good flow of
organized, varied data is required for a robust ML solution. In today’s online-first world,
companies have access to a large amount of data about their customers, usually in the
millions. This data, which is both large in the number of data points and the number of
fields, is known as big data due to the sheer amount of information it holds.
Classification
There is a division of classes of the inputs; the system produces a model from training
data wherein it assigns new inputs to one of these classes. It falls under the umbrella of
supervised learning. A real-life example can be spam filtering, where emails are the input
that is classified as “spam” or “not spammed”.
Collaborative Filtering
Collaborative filtering (CF) is a technique used by recommender systems. In the newer,
narrower sense, collaborative filtering is a method of making automatic predictions
(filtering) about the interests of a user by collecting preferences or taste information from
many users (collaborating).
Collaborative-based methods work with an interaction matrix, also called rating matrix.
The aim of this algorithm is to learn a function that can predict if a user will benefit from an
item-meaning the user will likely buy, listen to, watch this item. Among collaborative-based
systems, we can encounter two types: user item filtering and item-item filtering.
The aim of this algorithm is to learn a function that can predict if a user will benefit from an
item — meaning the user will likely listen to a song. This can be done by using rating.
There are two ways to collect user ratings: Explicit Rating and Implicit Rating.
Python
Python's large standard library, commonly cited as one of its greatest strengths, provides
tools suited to many tasks. For Internet-facing applications, many standard formats and
protocols such as MIME and HTTP are supported. It includes modules for
creating graphical user interfaces, connecting to relational databases, generating
pseudorandom numbers, arithmetic with arbitrary-precision decimals, manipulating regular
expressions, and unit testing.
Libraries such as NumPy, SciPy and Matplotlib allow the effective use of Python in
scientific computing, with specialized libraries such as Biopython and Astropy providing
domain-specific functionality. SageMath is a computer algebra system with a notebook
interface programmable in Python: its library covers many aspects of mathematics,
including algebra, combinatorics, numerical mathematics, number theory, and calculus.
OpenCV has Python bindings with a rich set of features for computer vision and image
processing.
Python is commonly used in artificial intelligence projects and machine learning projects
with the help of libraries like TensorFlow, Keras, Pytorch and Scikit-learn. As a scripting
language with modular architecture, simple syntax and rich text processing tools, Python
is often used for natural language processing.
Python Modules
In Python, Modules are simply files with the “.py” extension containing Python code that
can be imported inside another Python Program.
In simple terms, we can consider a module to be the same as a code library or a file that
contains a set of functions that you want to include in your application. With the help of
modules, we can organize related functions, classes, or any code block in the same file.
Some of the python modules included are:
Keras
Keras is an open-source software library that provides a Python interface for artificial
neural networks. Keras acts as an interface for the TensorFlow libra Up until version 2.3,
Keras supported multiple backends, including TensorFlow, Microsoft Cognitive
Toolkit, Theano, and PlaidML. As of version 2.4, only TensorFlow is supported. Designed
to enable fast experimentation with deep neural networks, it focuses on being user-
friendly, modular, and extensible.
Keras contains numerous implementations of commonly used neural-network building
blocks such as layers, objectives, activation functions, optimizers, and a host of tools to
make working with image and text data easier to simplify the coding necessary for writing
deep neural network code.
Pandas
Pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series. It is free software released under the three-
clause BSD license.
Pandas is mainly used for data analysis and associated manipulation of tabular data in
DataFrames. Pandas allows importing data from various file formats such as comma-
separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
Pandas allows various data manipulation operations such as merging, reshaping,
selecting, as well as data cleaning, and data wrangling features. The pandas library is built
upon another library NumPy, which is oriented to efficiently working with arrays instead of
the features of working on DataFrames.
Textblob
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API
for diving into common natural language processing (NLP) tasks such as part-of-speech
tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Spotipy
Spotipy is a lightweight Python library for the Spotify Web API. With Spotipy you get full
access to all of the music data provided by the Spotify platform. Spotipy supports all of the
features of the Spotify Web API including access to all end points, and support for user
authorization. For details on the capabilities, you are encouraged to review the Spotify
Web API documentation. All methods require user authorization. You will need to register
your app at My Dashboard to get the credentials necessary to make authorized calls
(a client id and client secret).
Jupyter Notebook
JupyterLab is a newer user interface for Project Jupyter, offering a flexible user interface
and more features than the classic notebook UI. The first stable release was announced
on February 20, 2018. In 2015, a joint $6 million grant from The Leona M. and Harry B.
Helmsley Charitable Trust, The Gordon and Betty Moore Foundation, and The Alfred P.
Sloan Foundation funded work that led to expanded capabilities of the core Jupyter tools,
as well as to the creation of JupyterLab.
2. Libo Zhang, Tiejian Luo, Fei Zhang and Anjum Wu. A Recommendation Model
Based on Deep Neural Network. Journal of Chinese Academy of Sciences, Beijing,
2017.