You are on page 1of 11

Spotify Playlist Recommendation System

A Project Work Synopsis

Submitted in the partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING
IN

Computer Science Engineering

(Spl. in Big Data Analytics)

Submitted by:

___________________________

University Roll Number:

______________________________________

Under the Supervision of:


___________________________

______________________________________

CHANDIGARH UNIVERSITY, GHARUAN, MOHALI - 140413,


PUNJAB

Oct 2022
Table of Contents

Introduction 3

Literature Survey 4

Proposed Methodology 5

References 12
Introduction

With the explosion of network in the past decades, internet has become the major source
of retrieving multimedia information such as video, books, and music etc. People has
considered that music is an important aspect of their lives and they listen to music, an
activity they engaged in frequently. Previous research has also indicated that participants
listened to music more often than any of the other activities (i.e., watching television,
reading books, and watching movies). Music, as a powerful communication and self-
expression approach, therefore, has appealed a wealth of research.

Rapid development of mobile devices and internet has made possible for us to access
different music resources freely. The number of songs available exceeds the listening
capacity of single individual. People sometimes feel difficult to choose from millions of
songs. Moreover, music service providers need an efficient way to manage songs and
help their costumers to discover music by giving quality recommendation. Thus, there is a
strong need of a good recommendation system.

With the rise of digital content distribution, people now have access to music collections
on an unprecedented scale. Commercial music libraries easily exceed 15 million songs,
which vastly exceeds the listening capability of any single person. With millions of songs
to choose from, people sometimes feel overwhelmed. Thus, an efficient music
recommender system is necessary in the interest of both music service providers and
customers. Users will have no more pain to make decisions on what to listen while music
companies can maintain their user group and attract new users by improving users’
satisfaction.

Currently, there are many music streaming services, like Pandora, Spotify, etc. which are
working on building high-precision commercial music recommendation systems. These
companies generate revenue by helping their customers discover relevant music and
charging them for the quality of their recommendation service. Thus, there is a strong
thriving market for good music recommendation systems. Music recommender system is a
system which learns from the users past listening history and recommends them songs
which they would probably like to hear in future.

In the academic field, the domain of user centric music recommendation has always been
ignored due to the lack of publicly available, open and transparent data. Million Song
Dataset Challenge provides data which is open and largescale which facilitates academic
research in user centric music recommender system which hasn’t been studied a lot.
Literature Survey

Existing System

Over the years, recommender systems have been studied widely and are divided into
different categories according to the approach being used. The categories are
collaborative filtering (CF), content based and context based.

Collaboration filtering
Collaborative filtering uses the numerical reviews given by the user and is mainly based
upon the historical data of the user available to the system. The historical data available
helps to build the user profile and the data available about the item is used to make the
item profile. Both the user profile and the item profile are used to make a recommendation
system. The Netflix Competition has given much popularity to collaborative filtering,
Collaborative filtering is considered the most basic and the easiest method to find
recommendations and make predictions regarding the sales of a product. It does have
some disadvantages which has led to the development of new methods and techniques.

Content Based Recommender System


Content based systems focus on the features of the products and aim at creating a user
profile depending on the previous reviews and also a profile of the item in accordance with
the features it provides and the reviews it has received. It is observed that reviews usually
contain product feature and user opinion in pairs. It is observed that users’ reviews contain
a feature of the product followed by his/her opinion about the product. Content based
recommendation systems help overcome sparsity problem that is faced in collaborative
filtering-based recommendation system.

Context Based Recommender System


Extending the user/item convention to the circumstances of the user to incorporate the
contextual information is what is achieved in context-based recommender systems. This
helps to abandon the cumbersome process of making the user fill a huge number of
personal details.

Recommender structures are proving to be a useful device for addressing a part of the
records overload phenomenon from the internet. Its evolution has followed the evolution of
the internet. The primary technology of recommender system used conventional web sites
to gather information from the following sources:
(a) content material-primarily based records
(b) demographic statistics, and
(c) memory-primarily based information.
Proposed System

It is possible to use a cluster-based algorithm to predict the songs, however, it lacks the
flexibility to add other features to the system, such as a classification predictor.
In other words, a clustered-based algorithm is one type of recommendation system.
However, compared to the two other types of RS introduced below, a cluster-based
algorithm lacks flexibility. In fact, both content-based filtering and collaborative filtering can
include the clustering outcome into the models, creating a hybrid Recommendation
System.

In the context of Spotify playlists, we use the features (loudness, tempo, etc.) of each
song in a playlist to find the average score of the whole playlist. Then, we recommend a
song that has a score similar to the playlist but is not in the playlist.
Proposed Methodology

Software Development Life Cycle

Waterfall Model

The waterfall model is a breakdown of project activities into linear sequential phases,
where each phase depends on the deliverables of the previous one and corresponds to a
specialization of tasks. The approach is typical for certain areas of engineering design.
In software development, it tends to be among the less iterative and flexible approaches,
as progress flows in largely one direction ("downwards" like a waterfall) through the
phases of conception, initiation, analysis, design, construction, testing, deployment and
maintenance.

The waterfall development model originated in the manufacturing and construction


industries, where the highly structured physical environments meant that design changes
became prohibitively expensive such sooner in the development process.

The following phases are followed in order:

1. System and software requirements: captured in a product requirements document


2. Analysis: resulting in models, schema, and business rules
3. Design: resulting in the software architecture
4. Coding: the development, proving, and integration of software
5. Testing: the systematic discovery and debugging of defects
6. Operations: the installation, migration, support, and maintenance of complete
systems

The waterfall model was selected as the SDLC model due to the following reasons:

• Requirements were very well documented, clear and fixed.


• Technology was adequately understood.
• Simple and easy to understand and use.
• There were no ambiguous requirements.
• Easy to manage due to the rigidity of the model. Each phase has specific
deliverables and a review process.
• Clearly defined stages.
• Well, understood milestones. Easy to arrange tasks.
Concepts

Machine Learning
A machine learning model is the output of the training process and is defined as the
mathematical representation of the real-world process. The machine learning algorithms
find the patterns in the training dataset, which is used to approximate the target function
and is responsible for mapping the inputs to the outputs from the available dataset. These
machine learning methods depend upon the type of task and are classified as
Classification models, Regression models, Clustering, Dimensionality. Reductions,
Principal Component Analysis, etc. Machine learning is no exception, and a good flow of
organized, varied data is required for a robust ML solution. In today’s online-first world,
companies have access to a large amount of data about their customers, usually in the
millions. This data, which is both large in the number of data points and the number of
fields, is known as big data due to the sheer amount of information it holds.

Classification
There is a division of classes of the inputs; the system produces a model from training
data wherein it assigns new inputs to one of these classes. It falls under the umbrella of
supervised learning. A real-life example can be spam filtering, where emails are the input
that is classified as “spam” or “not spammed”.

Collaborative Filtering
Collaborative filtering (CF) is a technique used by recommender systems. In the newer,
narrower sense, collaborative filtering is a method of making automatic predictions
(filtering) about the interests of a user by collecting preferences or taste information from
many users (collaborating).
Collaborative-based methods work with an interaction matrix, also called rating matrix.
The aim of this algorithm is to learn a function that can predict if a user will benefit from an
item-meaning the user will likely buy, listen to, watch this item. Among collaborative-based
systems, we can encounter two types: user item filtering and item-item filtering.
The aim of this algorithm is to learn a function that can predict if a user will benefit from an
item — meaning the user will likely listen to a song. This can be done by using rating.
There are two ways to collect user ratings: Explicit Rating and Implicit Rating.

Data Cleaning and Selection


Data cleaning is the process of detecting and correcting (or removing) corrupt or
inaccurate records from a record set, table, or database and refers to identifying
incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing,
modifying, or deleting the dirty or coarse data. Data selection: Data selection is defined as
the process of determining the appropriate data type and source, as well as suitable
instruments to collect data. Data selection precedes the actual practice of data collection.
The process of selecting suitable data for a research project can impact data integrity.
Programming Language

Python

Python is a high-level, interpreted, general-purpose programming language. Its design


philosophy emphasizes code readability with the use of significant indentation.

Python is dynamically-typed and garbage-collected. It supports multiple programming


paradigms, including structured (particularly procedural), object-oriented and functional
programming. It is often described as a "batteries included" language due to its
comprehensive standard library.

Python's large standard library, commonly cited as one of its greatest strengths, provides
tools suited to many tasks. For Internet-facing applications, many standard formats and
protocols such as MIME and HTTP are supported. It includes modules for
creating graphical user interfaces, connecting to relational databases, generating
pseudorandom numbers, arithmetic with arbitrary-precision decimals, manipulating regular
expressions, and unit testing.

Libraries such as NumPy, SciPy and Matplotlib allow the effective use of Python in
scientific computing, with specialized libraries such as Biopython and Astropy providing
domain-specific functionality. SageMath is a computer algebra system with a notebook
interface programmable in Python: its library covers many aspects of mathematics,
including algebra, combinatorics, numerical mathematics, number theory, and calculus.
OpenCV has Python bindings with a rich set of features for computer vision and image
processing.

Python is commonly used in artificial intelligence projects and machine learning projects
with the help of libraries like TensorFlow, Keras, Pytorch and Scikit-learn. As a scripting
language with modular architecture, simple syntax and rich text processing tools, Python
is often used for natural language processing.

Python Modules

In Python, Modules are simply files with the “.py” extension containing Python code that
can be imported inside another Python Program.
In simple terms, we can consider a module to be the same as a code library or a file that
contains a set of functions that you want to include in your application. With the help of
modules, we can organize related functions, classes, or any code block in the same file.
Some of the python modules included are:
Keras

Keras is an open-source software library that provides a Python interface for artificial
neural networks. Keras acts as an interface for the TensorFlow libra Up until version 2.3,
Keras supported multiple backends, including TensorFlow, Microsoft Cognitive
Toolkit, Theano, and PlaidML. As of version 2.4, only TensorFlow is supported. Designed
to enable fast experimentation with deep neural networks, it focuses on being user-
friendly, modular, and extensible.
Keras contains numerous implementations of commonly used neural-network building
blocks such as layers, objectives, activation functions, optimizers, and a host of tools to
make working with image and text data easier to simplify the coding necessary for writing
deep neural network code.

Pandas

Pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for
manipulating numerical tables and time series. It is free software released under the three-
clause BSD license.
Pandas is mainly used for data analysis and associated manipulation of tabular data in
DataFrames. Pandas allows importing data from various file formats such as comma-
separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
Pandas allows various data manipulation operations such as merging, reshaping,
selecting, as well as data cleaning, and data wrangling features. The pandas library is built
upon another library NumPy, which is oriented to efficiently working with arrays instead of
the features of working on DataFrames.

Textblob

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API
for diving into common natural language processing (NLP) tasks such as part-of-speech
tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Spotipy

Spotipy is a lightweight Python library for the Spotify Web API. With Spotipy you get full
access to all of the music data provided by the Spotify platform. Spotipy supports all of the
features of the Spotify Web API including access to all end points, and support for user
authorization. For details on the capabilities, you are encouraged to review the Spotify
Web API documentation. All methods require user authorization. You will need to register
your app at My Dashboard to get the credentials necessary to make authorized calls
(a client id and client secret).
Jupyter Notebook

Jupyter Notebook (formerly IPython Notebook) is a web-based interactive computational


environment for creating notebook documents. Jupyter Notebook is built using
several open-source libraries, including IPython, ZeroMQ, Tornado, jQuery, Bootstrap,
and MathJax. A Jupyter Notebook document is a browser-based REPL containing an
ordered list of input/output cells which can contain code, text (using Markdown),
mathematics, plots and rich media. Underneath the interface, a notebook is
a JSON document, following a versioned schema, usually ending with the ".ipynb"
extension.

Jupyter Notebook is similar to the notebook interface of other programs such


as Maple, Mathematica, and SageMath, a computational interface style that originated
with Mathematica in the 1980s. Jupyter interest overtook the popularity of the
Mathematica notebook interface in early 2018.

JupyterLab is a newer user interface for Project Jupyter, offering a flexible user interface
and more features than the classic notebook UI. The first stable release was announced
on February 20, 2018. In 2015, a joint $6 million grant from The Leona M. and Harry B.
Helmsley Charitable Trust, The Gordon and Betty Moore Foundation, and The Alfred P.
Sloan Foundation funded work that led to expanded capabilities of the core Jupyter tools,
as well as to the creation of JupyterLab.

JupyterHub is a multi-user server for Jupyter Notebooks. It is designed to support many


users by spawning, managing, and proxying many singular Jupyter Notebook servers.
References

1. Shefali Garg, Fangyan SUN. Music Recommender System, Journal of Indian


Institute of Technology, Kanpur, 2014.

2. Libo Zhang, Tiejian Luo, Fei Zhang and Anjum Wu. A Recommendation Model
Based on Deep Neural Network. Journal of Chinese Academy of Sciences, Beijing,
2017.

3. Keita Nakamura, Takako Fujisawa. Music recommendation system using lyric


network, Journal of 2017 IEEE 6th Global Conference on Consumer Electronics
(GCCE), 2017.

4. Yading Song, Simon Dixon, and Marcus Pearce. A Survey of Music


Recommendation Systems and Future Perspectives, Proceedings of 9th
International Symposium on Computer Music Modelling and Retrieval (CMMR),
2012.

5. Malte Ludewig, Iman Kamehkhosh, Nick Landia, Dietmar Jannach. Effective


Nearest-Neighbor Music Recommendations. Proceedings of the ACM
Recommender Systems Challenge 2018

You might also like