Ieee Final

DEEP LEARNING BASED SONG RECOMMENDATION SYSTEM
USING FACIAL EXPRESSION

Sheela Chinchmalatpure Tanushree Prakash Kanade Purva Sarvade
Department of AIDS VIT Pune Department of AIDS VIT Pune Department of AIDS VIT Pune
sheela.chinchmalatpure@vit.edu kanade.tanushree22@vit.edu purva.sarvade22@vit.edu
Gaurav Zanwar Sharvari Dhage Ashitosh Wankhade

Department of AIDS VIT Pune Department of AIDS VIT Pune Department of AIDS VIT Pune
gaurav.zanwar22@vit.edu sharvari.dhage221@vit.edu ashitosh.wankhade22@vit.edu
Tanuj Somani
Department of AIDS
VIT Pune
tanuj.somani22@vit.edu
Abstract —In today's music streaming landscape, Model and develop a custom Convolutional Neural
personalized song recommendations are instrumental in Network (CNN). This CNN model exhibits an impressive
enhancing user experiences. This research project introduces an accuracy rate of 92% in real-time emotion detection through the
innovative approach to song recommendation using deep analysis of facial expressions. By doing so, we are eliminating the
learning, specifically facial expression recognition, implemented one-size-fits-all approach and ensuring that the model is finely
through a Convolutional Neural Network (CNN) model with an tuned to capture the subtle nuances of emotions, regardless of
impressive 92% accuracy rate. The project involves the
individual differences.
collection and analysis of users' facial expressions to map them
to emotional states and subsequently match these states with
songs from an extensive music library. By utilizing emotional Once emotions are accurately recognized in real-time, we employ
cues derived from facial expressions, the recommendation an API-based approach to recommend songs that align seamlessly
system ensures that songs recommended are not only with the user's dynamically detected emotional state. Our system
contextually relevant but also emotionally resonant with the seamlessly interfaces with a vast music library, ensuring that the
user's current state, ultimately creating a more engaging and suggested songs are not only contextually relevant but also
personalized music discovery experience. emotionally harmonized with the user's immediate state of mind.
Keywords — User experience, Deep learning, Song
This innovative approach promises to significantly enhance the
recommendation, Convolutional Neural Network, facial
music listening experience, enabling users to discover and enjoy
expressions.
music that not only matches their preferences but also resonates
I.INTRODUCTION with their ever-evolving emotional state. This research paper
delves deeply into the theoretical foundations of the project, the
In the rapidly evolving landscape of modern music streaming, intricate technical details of our custom-built CNN model, the
the delivery of personalized song recommendations has become real-time facial expression recognition methodology, and the
an indispensable cornerstone in creating a captivating and highly workings of the API-based song recommendation system.
individualized user experience. In a world where an extensive
catalog of music is available at our fingertips, the challenge lies Furthermore, this research explores the broader implications for
in not just offering an overwhelming number of options but in music streaming platforms and user engagement. By uniquely
intelligently guiding users towards the music that resonates most integrating artificial intelligence, music, and emotional
profoundly with their unique preferences and emotional states. intelligence, our project seeks to foster a deeper and more
Traditional recommendation systems predominantly rely on personal connection between users and their music. This
historical listening data and song metadata, often falling short in endeavor, accomplished without pre-trained models, exemplifies
capturing the dynamic and nuanced nature of human emotions the potential of custom-built solutions to transform the landscape
intricately linked to the appreciation of music. of music streaming recommendations, enhancing the overall
listening experience in profound ways.
This project ventures into uncharted territory by introducing a
groundbreaking and innovative approach to song
recommendations, integrating deep learning and real-time facial II. LITERATURE SURVEY
expression recognition. Music is a profound vehicle for
emotional expression and connection, and it is imperative to
[1] This paper explores Pioneer Corporation's Music
comprehend a user's real-time emotional state to deliver Recommendation System, which helps users discover music that
recommendations that are genuinely personalized. suits their tastes and moods. Tonality and rhythm are among the key
In our implementation, we steer away from pre-trained "Music Features" that the system extracts from CDs and uses to
recommend music depending on selected moods, like "Bright" or
"Quiet." In order to better accommodate personal tastes, it modifies
recommendations in light of user feedback. The test results show a
high level of user satisfaction with the suggested music. Basically,
what this system does is it provides tailored and automated and the "cold start" problem. It suggests a cutting-edge method
music recommendations that change over time to improve the for making music recommendations based on user emotions by
overall experience of listening to music. Pioneer's creative integrating social media content. The system integrates a music
method makes it easier for people to find music, making their player, a playlist of recommended songs, and a learning process,
musical journey more interesting and pleasurable. utilizing data from Facebook, Last.fm, and user music histories.
It employs collaborative and content-based filtering, overcoming
issues like recommending unheard songs and catering to new
[2] This paper examines music recommendations while users. Future enhancements are suggested, such as context-aware
highlighting the strong link between music and emotions in recommendations based on time and weather, and additional
people. The approach used in this article, however, is novel: it data gathering from platforms like YouTube. All things
makes use of neural networks to suggest music depending on a considered, this technique provides a more accurate and
person's present mood, as determined by an analysis of their customized method of music recommendation based on user
facial expressions. Rather, a webcam or video records facial moods and acoustic characteristics.
expressions, which are then processed by a neural network to
determine the person's emotional state and provide mood-
appropriate music selections. The transition from data-driven to [8] This paper explores the influence of digital transformation on
emotion-based, real-time music recommendations is expected to the music industry and evolving user listening patterns, driven by
improve the user's overall musical experience by recommending electronic devices and online music services. It highlights the
music that is more in line with their current emotional state. challenge of choosing mood-specific music from a vast library of
options. Existing music apps often fall short in recommending
[3] This paper examines music recommendation systems, sentiment-based music. While predefined playlists exist, they
highlighting the significant relationship between hum and music. struggle to adapt to individual preferences. The paper introduces
This study examines the function of recommender systems in "SentiSpotMusic," a sentiment-driven music recommendation
many digital fields with a particular emphasis on music. It system using a Tableau dashboard and Spotify data. This innovative
demonstrates how real-time music suggestions based on user approach aims to bridge the gap in current music recommendations,
data analysis can improve the listening experience by utilizing offering users music that aligns with their emotional state and
machine learning, collaborative filtering, and the K-means mood. It represents a potential game-changer in providing users
algorithm. To enhance customer pleasure and expedite music with more personalized music experiences in the digital age.
discovery, the system incorporates data sources such as
transaction history, item details, and user information.
[9] This paper explores the difficulties associated with choosing
[4 This research presents a novel artificial intelligence and music in the smartphone age, when a plethora of internet music
machine learning based personalized music recommendation libraries is available. Users struggle to choose music based on their
system that analyzes speech using MFCC to determine a user's environment and emotional state. The study notes that with new
emotional state. To improve accuracy, it makes use of deep users and music items joining the system on a regular basis,
learning, specifically Artificial Neural Networks. This research recommendations must be quick and tailored. It acknowledges the
opens the door for AI-driven, emotion-based music suggestions, worth of resource-intensive, individually tailored suggestions for
which could make the process of discovering new music more business use. The main objective is to recommend music that is in
emotionally engaging even though consistent emotion line with user preferences. Several models are analysed in this
recognition remains a hurdle. research to establish comparisons, including Graph-based Newness
Research on Music Recommendation, Music Recommendation
[5]This research paper offers a revolutionary music System Based on the Continuous Combination of Contextual
recommendation system that predicts user emotions with an Information, and Smart-DJ: Context-aware Personalizing for Music
84.82% accuracy rate by analysing facial expressions using Recommendation on Cell phones. The analysis is done on a Douban
computer vision. The integrated camera eliminates the need Music dataset with the ultimate goal of improving the music
for extra hardware, making it convenient for various recommendation system.
applications. This innovative approach promises more precise
and effective music suggestions, potentially revolutionizing [10]This paper outlines the development of a music
personalized music recommendations and broader recommendation system, centering on content-based methods,
applications in areas like music therapy and public space particularly acoustic similarity. It delves into two distinct
ambiance enhancement. approaches: the first involves conventional analysis of acoustic
features, while the second integrates deep learning and computer
vision techniques to enhance system performance. The research
[6] This paper addresses challenges in music seeks to create a more accurate music recommendation system by
recommendation systems due to network technology and emphasizing acoustic attributes and employing advanced
online platforms. It focuses on data storage, computational technologies. By doing so, it aims to potentially transform the way
efficiency, and data sparsity. A hybrid recommendation users discover music that aligns with their preferences and moods.
system that emphasizes collaborative filtering forms the This work underscores the significance of acoustic features in the
study's central component. It presents the offline and real- context of music recommendations and underscores the potential for
time data channels and carefully designs the system cutting-edge technologies to enhance the system's capabilities,
architecture. The paper highlights an improved algorithm and ultimately delivering more precise and effective music suggestions
Hadoop framework for efficiency and scalability. The goal is to users.
to enhance music recommendations for users' personalized
preferences, ultimately improving the music listening
experience.
[7] This paper tackles challenges in music recommendation

systems, addressing data storage, computational efficiency,
Fig.2 Convolutional Neural Network Flowchart
III.PROPOSED SYSTEM
In the framework of emotion analysis, a Convolutional Neural
Network (CNN) model is created with the intention of extracting
features from facial photographs. The model architecture is made up
of various essential parts. It starts with two 2D convolutional layers
that use rectified linear unit (ReLU) activation functions. The first
layer has 32 filters, and the second layer has 64 filters.
Two-Dimensional Convolution Operation Formula:

O[I, j] = ∑n∑mI[i + n, j + m]. K[m,n]
Rectified Linear Unit (ReLU) activation function:

f(x) = max(0,x)
The goal of these convolutional layers is to record complex patterns

found in face expressions. To minimize spatial dimensions, max-
pooling layers with 2x2 pool sizes are added. To avoid overfitting,
dropout layers with a dropout rate of 0.25 are added.
Two additional 2D convolutional layers, each with 128 filters, and
max-pooling layers for dimensionality reduction help the network
comprehend facial expressions even more.
Fig.1- Shows System flow process
Max Pooling Operation Formula:
Facial expression-driven song recommendation system using O[I,j]=max(I[i∗stride:i∗stride+poolsize,j∗stride:j∗stride+p
Convolutional Neural Network (CNN) for real-time emotion-
based suggestions. oolsize])
A. Data Collection Model resilience is improved by adding a final dropout layer with a
0.5 dropout rate. Using a softmax activation function, the output
We are collecting real-time facial expression data through an
layer's seven units represent the seven distinct emotions that need to
integrated webcam and training it with the FER2013 dataset,
be categorized.
improving our system's ability to suggest personalized music
based on users' current emotions, creating a more tailored
music experience. Softmax Function for Probability Calculation:
P(y= i|X) = (exi) ÷ (∑jexj)
B. Preprocessing Step
We used the flow_from_directory technique for structured The CNN model is built using the Adam optimizer and the
batch processing and ImageDataGenerator for effective data categorical cross-entropy loss function with a learning rate of 0.0001.
augmentation in our project to fully utilize the features of the It is then trained using training and validation data generators. After
Keras library. Each test and training image was standardized training, the trained weights are saved in a.h5 file for further usage,
as part of the data preprocessing pipeline, which included and the model structure is saved in a JSON file. The heart of the
grayscale conversion and scaling to a consistent 48x48 pixel emotion analysis system is this CNN model, which efficiently
size. This careful preparation improves overall research extracts characteristics from face photos and classifies emotions
findings by ensuring consistency in the picture data and using these features.
optimizing it for further analysis and model training.
D. Experimental Output
C. Feature Extraction using CNN

In our system's concluding phase, we leverage the CNN
model's capabilities to precisely evaluate users' emotional states
through their facial expressions. This emotion, diligently
discerned by our CNN, constitutes a pivotal component within
our song recommendation framework.
We seamlessly integrate an Application Programming Interface
(API) to establish a connection between this recognized emotion
and an extensive song library. This API serves as the conduit
between the real-time detected emotion and a thoughtfully
curated selection of songs that harmonize with that specific
emotional context.
This dynamic procedure enables us to correlate the
recognized emotions with a diverse collection of songs,
guaranteeing that our song recommendations align
seamlessly with the user's prevailing emotional disposition.
The outcome is a personalized playlist that resonates with the
user's feelings, elevating their music listening experience by
offering songs that authentically mirror their emotional state.
Based on the identified facial expressions, the recommendation
VI. RESULT system—which was connected with the Spotify API—provided
music recommendations.
Happy:
We used the FER2013 dataset, which is made up of pictures
of people's faces categorized with seven different emotion
categories: neutral, surprise, happiness, sadness, disgust, and
anger.
For model training, hyper parameter tuning, and evaluation,
the dataset was split into subsets for test (10%), validation
(10%), and training (80%).
The emotion detection model demonstrated its ability to
identify face emotions with an accuracy of 92% on the test
set.
Sad:
Angry:
Neutral:
FP – False Positive
FN – False Negative
Surprised:
Fig 3 . Accuracy of different Algorithm
References
Fearful:
[1] Y. Kodama et al., "A music recommendation system," 2005
Digest of Technical Papers. International Conference on
Consumer Electronics, 2005. ICCE., Las Vegas, NV, 2005,
pp. 219-220, doi: 10.1109/ICCE.2005.1429796.
[2] V. P. Sharma, A. S. Gaded, D. Chaudhary, S. Kumar and S. Sharma, "Emotion-

Based Music Recommendation System," 2021 9th International Conference on
Reliability, Infocom Technologies and Optimization (Trends and Future
Directions) (ICRITO), Noida, India, 2021, pp. 1-5, doi:
Disgusted: 10.1109/ICRITO51393.2021.9596276.
[3] R. Kumar and Rakesh, "Music Recommendation System
Using Machine Learning," 2022 4th International Conference
on Advances in Computing, Communication Control and
Networking (ICAC3N), Greater Noida, India, 2022, pp. 572-
576, doi: 10.1109/ICAC3N56670.2022.10074362.
According to user feedback surveys, 90% of users were
satisfied, confirming that the songs that were suggested [4] R. R. Jaison C and M. Rajeswari, "Song Recommendation
matched the emotional context of the users as seen by their
based on Voice Tone Analysis," 2023 Second International
facial expressions.
Conference on Electronics and Renewable Systems
(ICEARS), Tuticorin, India, 2023, pp. 708-712, doi:
Accuracy = (TP + TN) / (TP + TN + FP + FN) [5] K. Vayadande, P. Narkhede, S. Nikam, N. Punde, S. Hukare and
R. Thakur, "Facial Emotion Based Song Recommendation
System," 2023 International Conference on Computational
TP – True Positive Intelligence and Sustainable Engineering Solutions (CISES),
Greater Noida, India, 2023, pp. 240-248, doi:
TN – True Negative 10.1109/CISES58720.2023.10183606.
[6] Y. Lin, "Design and Implementation of Music Recommendation
CN D MF RF SVM System Based on Big Data Platform," 2022 IEEE 2nd
N T International Conference on Computer Systems (ICCS),
Qingdao, China, 2022, pp. 51-54, doi:
Accuracy 85 7 75% 80% 78% 10.1109/ICCS56273.2022.9988197.
% 0 [7] C. H. P. D. Wishwanath and S. Ahangama, "A Personalized
% Music Recommendation System based on User Moods," 2019
Computatio Hig L Moder Moder Moder 19th International Conference on Advances in ICT for Emerging
nal h o ate ate ate Regions (ICTer), Colombo, Sri Lanka, 2019, pp. 1-1, doi:
Resources w 10.1109/ICTer48817.2019.9023727.
Required [8] E. Sarin, S. Vashishtha, Megha and S. Kaur, "SentiSpotMusic: a music
recommendation system based on sentiment analysis," 2021 4th
Data Lar L Moder Low Moder International Conference on Recent Trends in Computer Science and
Requiremen ge o ate ate Technology (ICRTCST), Jamshedpur, India, 2022, pp. 373-378, doi:
[9] A. Patel and R. Wadhvani, "A Comparative Study of Music
ts w Recommendation Systems," 2018 IEEE International Students'
Versatility Hig L Low Moder Moder
h o ate ate
w
Conference on Electrical, Electronics and Computer Science
(SCEECS), Bhopal, India, 2018, pp. 1-4, doi:
10.1109/SCEECS.2018.8546852.
[10] A. Niyazov, E. Mikhailova and O. Egorova, "Content-based Music
Recommendation System," 2021 29th Conference of Open
Innovations Association (FRUCT), Tampere, Finland, 2021, pp. 274-
279, doi: 10.23919/FRUCT52173.2021.9435533.

Ieee Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ieee Final

Uploaded by

Copyright:

Available Formats

DEEP LEARNING BASED SONG RECOMMENDATION SYSTEM

USING FACIAL EXPRESSION

Gaurav Zanwar Sharvari Dhage Ashitosh Wankhade

[7] This paper tackles challenges in music recommendation

Two-Dimensional Convolution Operation Formula:

Rectified Linear Unit (ReLU) activation function:

The goal of these convolutional layers is to record complex patterns

C. Feature Extraction using CNN

Fig 3 . Accuracy of different Algorithm

[2] V. P. Sharma, A. S. Gaded, D. Chaudhary, S. Kumar and S. Sharma, "Emotion-

You might also like