You are on page 1of 54

School of Computer Science

College of Engineering and Physical Sciences

BSc. Project

A Machine Learning Approach to Predict


Emotions Using Music and Text on Social Media

Submitted in conformity with the requirements for the


degree of BSc. Artificial Intelligence and Computer
Science
School of Computer Science University of
Birmingham

Shaylan Rao, BSc. (Hons)


Student ID: 2000963
Supervisor: Dr. Phillip Smith

September 2021
Word Count: 9,937

Acknowledgements
While completing my degree and dissertation, I have received a great deal of support and guidance.

I would like to thank foremost my supervisor, Dr. Phillip Smith, who has been encouraging throughout the
whole of my final year. Your expertise, knowledge and consideration towards my project and progress at
university has allowed me to attain what I have achieved. You have been approachable and reassuring
during the process of this dissertation which I most appreciate.

I would like to acknowledge my peers from Computer Science for always being supportive of my
ambitions. I would like to also thank other friends for their sympathy and understanding on the challenges
I have faced.

Additionally, I would like to thank my parents and family, particularly my brother, Dr. Krishan Rao, who
has persistently been by my side, looking out for me even during hard times. You are always there for me;
you are my best friend. Finally, I could not have completed this dissertation without the support of my
housemates, helping me through this final year, keeping me smiling and Grace Leppard for your happy
distractions outside of the course.

1
Abstract

Predicting emotions for an individual can be a challenging task due to the lack of specified data required.
Many people rely on tools, techniques and applications to help track their mood, helping them monitor
their wellbeing. Giving a person perspective to recognise their emotional state can be a highly useful and
effective tool to support mental health.

This project aims to use machine learning and utilise data from online applications in order to provide
improved accurate predictions of emotions per user, providing tailored results. This highlights the
importance that not every person is impacted the same way and predicting emotions cannot always be
generalised across a population.

2
BSc. A Machine Learning Approach to Predict
Emotions Using Music and Text on Social Media

Shaylan Rao BSc. (Hons)

Contents
Acknowledgements...........................................................................................................................................1
Abstract.............................................................................................................................................................2
Glossary.............................................................................................................................................................4
Introduction.......................................................................................................................................................5
1.1 Background......................................................................................................................................5
Design...............................................................................................................................................................6
2.1 Design Choices................................................................................................................................6
2.2 Functional Requirements.................................................................................................................9
2.3 Non-Functional Requirements.......................................................................................................11
2.4 User Interface.................................................................................................................................12
Data.................................................................................................................................................................13
3.1 Accessible Data..............................................................................................................................13
3.2 Good Data......................................................................................................................................14
Architecture.....................................................................................................................................................15
4.1 System Structure............................................................................................................................16
4.2 Data Gathering...............................................................................................................................17
4.3 Modelling.......................................................................................................................................17
4.4 Application.....................................................................................................................................18
Implementation...............................................................................................................................................19
5.1 Experimentation.............................................................................................................................19
5.2 Modelling.......................................................................................................................................23
5.3 Web Application............................................................................................................................30
Evaluation.......................................................................................................................................................31
6.1 Requirements..................................................................................................................................31
6.2 Testing............................................................................................................................................32
6.3 Conclusion......................................................................................................................................33
References.......................................................................................................................................................34
Appendix.........................................................................................................................................................36

3
Glossary
Term Definition Page

Mood Boost A playlist designed to influence the emotion of the listener (usually aimed
to increase levels of joy) 5
Playlist

Spotify Tweet A Tweet containing the link/reference to a Spotify Song with an included 6
message, reflecting on the opinion of the song or how the user is feeling
API Application Programming Interface, intermediary software between two
7
applications, providing communication

Natural Language A branch of AI that computes the analysis and synthesis of natural
language 7
Processing

Tickets A term used to describe a task required to be completed within a sprint 9

Blockers A term used to describe a component or factor obstructing the current path 9
of progress towards achieving a goal
MoSCoW This separates the functionality into must have (mandatory for the user),
should have (add considerable value), could have (small improvements) 10
and will not have (not within the scope of this project).
MVP Minimum viable product, a product with only enough functionality to be 10
released/deployed
DOM Document Object Model, structured representation of HTML in a webpage 12

View A term to describe the visual appearance of a webpage 12

MVC “Model, handles logic. View, displays information. Controller, controls


data flow into a model object and updates view whenever data changes” - 18
(Svirca, 2020)

Ordinary Least An estimation for coefficients of the equation in a linear regression


26
Squares Loss

Confidence Confidence interval, “the probability a population parameter will fall


between a set of values for a certain proportion of times” - (Adam Hayes, 28
2021)
Split “The process of dividing a node into two or more sub-nodes.” - (Decision 28
Tree Regressor explained in depth, 2019)

4
Chapter 1

Introduction
Technological development supporting wellbeing has recently boomed, creating a disrupted industry
where billions of dollars have been invested to bridge the gap between mental health and accessible,
everyday technology. Some of the most popular mood tracking applications such as Moodfit and Moodily
function as a journal, dependent on user input to identify their current mood, introducing several problems.
One of which is the inaccuracy of the input which may be caused from a user feeling pressured to record
positive emotions or the user lacking perspective on what emotion they are currently feeling, producing
false reports.

This project aims to improve the accuracy of tracking emotions by making predictions based on user-
specific data from the social media platform, Twitter and music streaming service Spotify. These sources
of data reflect and impact emotion as well as being strong, consistent indicators of a person’s current
emotional status as they are often updated multiple times a day. Applying machine learning to understand a
user’s perspective on songs creates a tailored mood identifier, rather than using fixed generic measurement.
Data scarcity poses a key challenge with limited user-specific data, whereas similar projects sample a
population.

1.1 Background
Spotify offers a range of methods to listen to music from allowing users to create their own playlists, to a
user searching a specific song, and personalised autonomously made playlists. Though there may be bias
by what Spotify initially offers users, it is still dependent on the user to choose what they listen to.

Several components of music can impact how a person feels such as tempo altering or regulating heart rate,
or specific frequencies used to calm a listener. Spotify offers playlists based on emotions such as ‘Happy
Beats’ which is designed to increase levels of joy however, each person has unique perception of songs
which impacts their emotional levels differently (Ronan et al., 2018). As a result, we cannot accurately
determine if a mood boost playlist works as intended for each person. There are 5 core emotions (anger,
disgust, fear, joy and sadness) where varying levels of each can represent nearly all emotions (iamheart,
2017).

A project which uses the same sources in a similar goal is (Pichl et al., 2015), where Twitter was used to
find trending songs on Spotify. An approach that applies machine learning to a similar task is in (Lin,
2018), where playlists are autonomously generated using personal music data. These two papers are
referenced throughout this project.

5
Chapter 2

Design

The requirements for this project differs from those referenced in 1.1 Background as the objective is to
create an application accessible to users, as opposed to solely producing a dataset (Pichl et al., 2015).
Another goal is to process and output unique data regarding a user’s music history, tweet messages and a
corresponding prediction(s).

In order to predict an emotion based on a song, each song must be labelled with a value for an emotion. A
song can be measured using continuous values from features of a song, for example ‘loudness’. By using
multiple features, each song will have a unique combination of attributes that describes how the song
sounds. Defining a song with multiple attributes, we can now label each combination of music features
with a continuous value representing the emotion experienced by the user. This will be gained by analysing
text written by the user in a Spotify Tweet. Labelling a song assigns the value of emotion for each music
attribute. When another song is labelled, attributes are populated and assigned the new value of emotion.

When labelling N songs, there will be N points per emotion, with a dependant variable representing
emotion and a corresponding independent variable representing song attributes. With this data, I intend to
use a machine learning algorithm to learn patterns in emotions using combinations of music attributes from
songs listened to by the user. Similar sounding songs will have similar values for each attribute. Taking
two points representing a song in an I dimensional space, the distance between these represents the
difference in how they sound.

2.1 Design Choices


2.1.1 Key Functionality

These are the fundamental project goals:

I. Easily accessible for users who own a Spotify and Twitter account
II. Predict user emotions based on recent listening history
III. Dynamically train a model to produce results in a reasonable period of time (2 minutes)
IV. Clearly present predicted data where users can understand their current/recent emotion

6
I. Accessibility

Web applications provide accessibility on both mobile and desktop devices using reactive web
development tools. For ease of accessibility, a simple login page and dashboard would be ideal, only
requiring a user to authorise using their account data on Spotify and Twitter. HTML, CSS and JavaScript
would help develop the front-end functionality and appearance. As the backend will be written in Python,
React and Flask are ideal choices to develop the front-end and back-end respectively.

II. Predicting

Emotions of a song will be present in the text of Spotify Tweets, containing the corresponding track. As
this requires NLP (Natural Language Processing), it is appropriate to use IBM Watsons Tone Analyser to
score each core emotion.

IBM Watson uses deep learning techniques, trained on thousands of data points from a range of domains,
thus making it a generalised, versatile sentiment analyser. It aligns with the focus of analysing core
emotions, outputting continuous values, (except for disgust as of 2016). IBM’s model was benchmarked
against other emotion datasets such as ISEAR and SEMIVAL and achieved a higher statistical accuracy
than a state-of-the-art model (Michelle Miller and IBM, 2019).

APIs provides access to IBM Watson which process data on an enterprise scale. This would otherwise be
processed locally, requiring access to greater resources leading to poorer performance in comparison.

Since the sentiment values become the labels and are a key tenet to downstream processing, it is important
to ensure that these are as accurate as possible to form a usable dataset. Thus, making it crucial to develop
a system that minimises error to produce a reliable model. With inaccurate labels, false/no correlations
may be present, rendering the predictions unusable.

III. Model Performance

The web application requires front-end and back-end processing. An alternative to React is Angular, which
is a framework instead of a library and can be used for UI development. React having a component based,
modular design allows self-contained features to be easily added and maintained. Processing time is
reduced using React as it binds data unidirectionally as opposed to Angular’s bidirectional binding.

Majority of processing will be executed sever-side making Python a suitable choice. With access to
powerful libraries such as Pandas for data manipulation and scikit-learn for training machine learning
models, Python provides flexibility with a quicker development time compared to other languages such as
C. Experimenting with several models, speed of code production is a crucial factor ensuring a variety of

7
models can be developed. Libraries such as matplotlib and seaborn support various graphing techniques,
useful for identifying correlations during development.

The use of APIs such as IBM Watson also allows processing to be divided and executed on external
servers, reducing the workload on the system server. Exploiting metadata within Spotify’s web API allows
a range (11) music features to be accessed for each song available on Spotify.

IV. Visualisation

For the application to be purposeful for the user, predictions must be clearly displayed in an easy to
interpret manner. As the model is predicting emotions the user is monitoring, a line graph would be most
appropriate. Peaks/troughs are easy to distinguish as well as the rate of change characterised by the
gradient between points. Producing multiple lines per emotion on the same graph allows comparing
emotions at a given point simpler, as opposed to separate graphs.

Using Kendo-Graph’s line graph, a user will be able to pan across the graph, zooming in and out to take a
closer inspection of the predictions. Using the property tool-tip, the user will be able to view values of any
given point via a mouse hover. Other built in functionality includes toggling classes, giving the ability to
only view desired emotions. The type of graph is easily adjustable, parameterised by type, allowing future
changes and developments to be made effortlessly.

2.1.2 APIs

Spotipy / Spotify Web

The strategic use of this API is to extract music features from songs. As the project focuses on Spotify
users, this provides full coverage of all songs a user can listen to. This API uses REST principles, allowing
data to be fetched and permits users to interact with their account through possible functions implemented
on the web application. The responses of data retrieved contain JSON objects, a generic data format able to
be accessed and manipulated in Python, and widely accepted by other systems including IBM Watson.
JSON is a subset of JavaScript meaning objects can be accessed and manipulated by the front end if
necessary.

8
Tweepy / Twitter

This API is required to access tweets from twitter. The primary feature of this tool is the search function,
where a query can be passed and JSON objects are returned. This is a necessary API as it is the most
efficient method of gaining twitter related data with respect to processing time and code required for
implementation due to prebuilt functionality. The Twitter API is required for user authentication, making it
a compulsory API.

2.1.3 Development Methodologies

Agile

The Agile framework is a method used to break up a project into smaller components, where over a set
period of time (a sprint), micro-targets are to be achieved. This requires a continuous cycle of planning,
developing and evaluating from both a developer and end-user perspective to ensure the project is
constantly in line with the users requirements. As the time constraint for this project is limited, it would be
appropriate for sprints to last a week to allow frequent re/evaluation, maintaining a focused route in project
development.

I plan to use a Kanban board to visualise, monitor and update progress as well as prepare future sprints.
Columns will include a backlog (containing tickets for future sprints), sprint backlog (tickets to be
executed in the current sprint), in progress (tickets in development), completed (accomplished tickets) and
review (completed tickets that may need re-evaluation for possible improvement).

Development is organised with Extreme Project Management (XPM), allowing the requirements to change
at any stage of the project. This is important as this specific approach has not been documented before,
there may be major blockers found during implementation as the project evolves.

2.2 Functional Requirements


The focus of this project is for the application to be generalised for any user, resulting in only one user
type.

A user of this system is one that owns a Spotify account and (preferably) frequently streams music in
addition to owning a Twitter account and posting Spotify Tweets. Typically a user would listen and post a
range of songs of varying genres, including messages that portray a variety of emotions.

9
The requirements can be broken down and separated into levels of importance in order to reach the
minimum goal of the working system and then additional features to complement the application. I have
followed the MoSCoW prioritisation methodology in order to classify the different requirements. Placing
must have requirements first, the MVP (Minimum Viable Product) can be achieved in the shortest
expected time (Digvijay Singh, 2020). Once the MVP is established, supplementary features can be added
in a modular approach, improving the functionality and quality of the product. The flexibility of this
approach means that any changes or new findings during implementation can adapt the plan for pending
features. Classifying should have and could have is based on two key points, effectiveness and ease of
coding. Effectiveness is measured by value gained by the user on a scale from 1 (close to no value added)
to 5 (extremely valuable). Value is measured by how personalised a feature is and how much new
information can be gained. This requires a thoughtful process of deciding the values for each requirement.
This is one of the disadvantages of MoSCoW, as when dealing with numerous requirements,
miscalculations classifying requirements can easily occur. It can be difficult to adjust/recalculate during the
implementation stage as there may be requirements that are dependent on other requirements (Hudaib et
al., 2018). To mitigate this course of error, I will only include a manageable number of requirements,
limiting the should have and could have category in particular.

The system must be able to:

1. Allow a user to login to their Spotify Account

2. Allow a user to login to their Twitter Account

3. Access user data once authenticated to (Spotify and Twitter) accounts

4. End access to a user’s account when the user logs out

5. Predict different types of emotions (Anger, Fear, Joy, Sadness)

6. Measure the intensity of emotions with a continuous value

7. Make user-specific predictions

8. Predict recent emotions for a user

9. Display a history of the users predicted emotions

10. Display each recently played song indicating the corresponding emotion

10
The system should be able to:

11. Allow the user to search for a specific song to predict

12. Predict the user’s emotion for any given song

13. View music history as reference for emotion predictions

The system could be able to:

14. Store and accumulate user data

15. Allow a user to amend predicted values and save changes

16. Create personalised playlists based on a user’s emotions

For the demonstration, a dataset of at least one individual user must be created. This will require trawling
Twitter to form a dataset, further explained in Chapter 3.

2.3 Non-Functional Requirements


Non-functional requirements impact decisions made in design and implementation. Some of these have
been raised previously but are explicitly stated here.

a) Dynamically train the prediction model upon login

b) Dynamically update user data (from Spotify and Twitter) upon login

c) Function for any user with a Spotify and Twitter account

d) Clearly display predicted emotions

e) Load the dashboard webpage in a quick (2 minutes) time

f) Users are able to understand the change in their emotion

g) The user interface should be intuitive and adaptive of the display

h) A user is able to easily access the dashboard displaying predictions

11
2.4 User Interface
To compliment the accessibility requirement, the UI (user interface) and experience must be simple and
intuitive to be effective, as the experience/background of the user is uncertain. Another advantageous
characteristic is the application being adaptative, easy to use on multiple devices.

Single page application often provide a better mobile experience, mimicking the style of an mobile app.
This is due to the webpage updating the DOM faster than loading another page. The transition between
views is seamless and accessing the application will be much easier via one URL (Mikowski and Powell,
2014).

As mentioned in, 2.1.1 Accessibility, using two views for the application, the login page and dashboard,
will keep the application clear and concise, while still providing all required functionality. Following ‘The
Three Click Rule’ as explored in (Jiménez Iglesias et al., 2018), composing the front page of only login
buttons will imply the main dashboard only requires a minimum of two clicks (logging into Spotify and
Twitter) if login data is cached. This is vital for keeping the user engaged and able to access the application
effortlessly.

As this application will be developed using React, the functionality can be broken down into component
based features displayed in Figure 2.4 and 2.5. Figure 2.4 shows the line graph will be the first element the
user will draw their eyes to as the most common is a top-down approach when viewing pages with a goal-
based approach (Țichindelean et al., 2021).

As the application will be dynamically training a model (non-functional requirement a.), there will need to
be a screen informing the user that the model is being trained and approximately 2 minutes will be required
for the application to progress. The design of the dashboard allows the user to access all interactive
components in one location, the UI design shown is the height of the whole page but the user would need
to scroll down to view all elements.

Having the design for the UI which takes into consideration all requirements, the architecture for how each
system process will operate must be constructed. The following chapter will focus on data availability and
format.

12
Chapter 3

Data

A large portion of this project is acquiring live data from online sources containing big data. Twitter being
one of the largest social networks, accumulating over 12 terabytes of data daily (Ankush Chavan, 2020),
provides a vast volume of growing data. As Twitter is an accessible microblogging platform, it is a great
source to gain information regarding users sentiments and opinions for a range of contexts, including
songs.

3.1 Accessible Data


Twitter API Search Features

The Twitter API will be used to query for specific data to be used to build collections of data for users.
This can specify searches by language, time, URLs , keywords and several other filters to obtain exact
records. There is extensive metadata contained within each tweet, providing direct access to specific
components of each tweet, this creates an efficient process when recursively extracting data.

Spotify API Music Features

Music features (Figure 1.1) are accessible through metadata for a given track via a track id. These values
will be attributes/independent variables for training prediction models. Some attributes such as key and
tempo are measured values, whereas others are calculated using a combination of attributes, for example
danceability includes tempo, rhythm stability, beat strength and several other features.

13
IBM Watson Tone Analyser

The labels for each song will be attributes 1 to 4 (Figure 1.2) as these convey emotion, attributes 5 to 7 are
related to how the text is written. All attributes will be stored as they may be useful for further
development and it does not consume a large volume of memory. Levels for each emotion will be labelled
to every song as there may be a combination of emotions present.

3.2 Good Data


‘Good quality’ data refers to usability, range and accuracy.

Data obtained must be usable or adapted into a usable form for internal system use. An example of
unusable data is a Spotify Tweet only containing emoticons, IBM Watson can only compute English (and
other languages) and will not be able to interpret characters outside written languages.

The Spotify Tweets obtained should contain a diverse range of emotions with varying levels of intensities
per emotion. This will generate a broader range of labels where smaller increments of intensities can be
labelled for each music feature, improving reliability. A class imbalance can cause a trained model to
overpredict high sampled classes, creating unreliable predictions (Guo et al., 2008).

Confounding variables impact the difficultly of measuring data with accuracy. For example, if a user uses
sarcasm or slang to convey a positive message, but in literature, is interpreted negatively. “This song is
sick!” depicts a highly positive sentiment from a user perspective, however, IBM Watson scored it 0.534
for sadness, nearly the polar opposite. Ideally, Spotify Tweets should contain literal meaning of what the
user aims to convey. Despite this minor issue, punctuation is able to be interpreted, for example “I really
like this song” with an exclamation mark at the end results increases joy by 0.014, a minor but more
accurate analysis.

Although the data to be collected is not likely to follow this standard precisely, it offers a perspective as to
what data will give optimal results for a machine learning algorithm.

14
Chapter 4

Architecture

The application will follow a client-server model. A focus of this project is the development of the server
side where the majority of computation will occur, the client side will primarily be focused on displaying
the UI design on the webpage.

The client side will contain JavaScript to coordinate the logic of the webpage views, manage storage and
communicate to the server side. It will specify the necessary HTML and CSS required to produce the
webpages that fits the design of the UI. The server side will provide the front end with data to use for each
component by means of accessing and pre-processing data. This is taken as the input for the model that
will produce predictions.

The server side will contain Python scripts to access and generate data used to produce predictive models.
Using Pandas Dataframes to store data is ideal as it enables different data types to be stored in a two
dimensional structure, similar to an SQL table. As a result, it is easy to convert and save this data in
different format such as CSV and SQL, accommodating the transition to a database server.

15
4.1 System Structure
Multiple stages of development are required.

1. Data Gathering

Identifying data required to train predictive models. Collecting extra data that may be useful for
additional functionality in the application. This would utilise Tweepy to trawl and query tweets.
Data from this section should be stored on a saved file, accessible for the following stages.

2. Feature Extraction

Gaining relevant information from raw data by cleaning and analysing (meta)data from external
systems. This would include processing Spotify Tweet messages to gain values for emotions using
IBM Watson and accessing track features via Spotipy.

3. Prediction

Creating and evaluating machine learning models that produce predictions to identify the
appropriate model from levels of validation and other factors. This would be a collection of
supervised learning models as the data would be labelled with emotions from the prior stage.

4. Application

Encapsulating the last three stages within the back-end of a web application, providing the user
with a friendly interface that clearly displays outputs from the model.

Stages 1 to 3 are Server Side components, (Figure 2.1) where the computation to produce the model has
been encapsulated. The 4th stage is the Website interface and connection between the user logging in and
the front-end communicating with the server.

16
4.2 Data Gathering
4.2.1 System Flow

Figure 2.2 illustrates the process required to obtain necessary data for several users. This method
incorporates elements from stages 1 and 2 and is to be implemented in the experimental stage (described in
5.1).

This process identifies relevant users, iterates through Spotify Tweet’s made by each user, and then
searches recent past tweets per Spotify Tweet. This is to gain a wider range of data to improve the
reliability of calculating emotions at a given time.

4.3 Modelling
This project explores which type of machine learning model is appropriate for the given data via paths of
classification and regression. As data has not yet been collected, the type of trends are unknown, meaning
that classification may be more representative than regression. Despite classification only producing
discrete values, I hypothesise the accuracy of classification to be higher due to outputs having a smaller
range of possible values. The goal of the project is to be able to predict if an emotion is present and if so,
what is the intensity of the emotion. The goal in classification is predicting a value being above or below a
certain threshold rather than a minimising the MSE (mean square error) which may falsely suggest an
emotion being present.

I decided to explore a range of different approaches for classification and regression (Figure 2.3). These
are further divided by type, where models are represented in leaf states. These models have been chosen
based on popularity, such as linear regression, and also for low density datasets, as it will be difficult to
gain a large number (thousands to millions) of records which machine learning algorithms expect as it
provides a more reliable understanding of the data.

17
4.4 Application
4.4.1 Server

The server is built with assistance of a web framework, Flask. The server provides data to the client-side
and may also receive user inputs to produce specific outputs, for example, predicting a song searched by
the user. Flask is a lightweight micro-framework with a built-in development server that can be hosted on a
different port making it easy to deploy for future development. Only a single Python file is required and
can deliver data through static routing to the web application. Flask easily allows the MVC architecture to
be followed, where the model is accessible within the same directory as the server which is composed of
several methods that act as controllers to update the view.

4.4.2 Lyric Scraping

This task works in line with the project as it applies the same system used for analysing Twitter messages
to any given text, including lyrics.

A website, https://genius.com/, contains lyrics for several songs with a useful characteristic of the URL
containing only a path with no queries of parameters or fragments. The path construction is ‘artist name-
song name-lyrics’ which are properties accessible via the Spotify API given a track id. To bridge data
online to the project, a Python package, Beautiful Soup can parse HTML and XML documents. Lyrics can
easily be extracted by accessing the ‘Lyrics__Container’ containing text to be parsed to IBM Watson. This
simple, flexible package is ideal as it only requires short additions of code to execute simple tasks, unlike
Selenium and Scrappy, which are popular alternatives. It also converts incoming documents into Unicode,
removing the need to reformat data before parsing into other systems.

The architecture supports asynchronous techniques, meaning that components are able to continuously be
updated, allowing the application to seem live to the user. This is also helpful for the delay that will occur
whilst training the model as there would not be any presentable data until the model is trained, but will
allow the view to update as soon as it is complete. The purpose of the Flask server is to centralise the
function of the controller and access the model (specific to the user) forming the MVC architecture. This
architecture will enhance development as each section (model, view and controller) are independent of
each other, enabling multiple streams of progress.

18
Chapter 5

Implementation

5.1 Experimentation
An experimental stage is required in order to create and evaluate data and models. As there is no existing
dataset, the data for this project will need to be obtained through trawling Twitter for users and then
exploring available data. The data gained will need to be analysed to recognise any biases and evaluated to
assess the accuracy of values calculated against the raw data, for example, validating the values of emotion
against the text. This is vital to obtain ‘Good Quality Data’ (3.2 Good Data).

Code implemented in experimentation will differ to the final product as this stage is for gathering data
from multiple users, compared to focusing on a given user. The motivation for this is to increase the
number of users collected during the searching process. By gathering a wider range of users, the
probability of finding users with ‘Good Quality Data’ increases.

There is a high chance of collecting users from a range of cultures. As explained in (Pichl et al., 2015),
there is a difference in commonly listened to songs based on geolocation, inferring that the type of music
users listen to is influenced by culture. This project focuses on addressing each user independently, only
using data directly related to the user. This will reduce inaccuracy when predicting as it will exclude false,
general dependencies shared across all users.

5.1.1 Querying

The method used in a previous paper Combining Spotify and Twitter Data (Pichl et al., 2015), used the
trend of users sharing music via a share function built into Spotify. This would include a default hashtag
such as “NowPlaying” or “spotify”, for example, the paper demonstrates:

“#NowPlaying Human (The Killers) #craig-cardiff #spotify http://t.co/N08f2MsdSt".

This is a common trend amongst microblogging platforms, especially Twitter. In an attempt to reproduce a
similar technique to obtain data relevant to this project, it was apparent that there were a number of
problems with this method.

19
The majority of data gained from doing searches did not contain any text reflecting the users’ feelings or
opinions. Instead, most posts were left empty, with only the Spotify track or occasionally even playlist.
Several tweets also came from bot users such as radio stations (BBCR3MusicBot) and small artists
attempting to gain recognition by advertising their music. The hashtag is now shared with other music
platforms such as Apple Music and even a few Spotify Tweets did not contain a link to the track, only text
naming the track and artist.

A more specific query into searching for a Spotify track URL would ensure the reference is to a track,
rather than a playlist, and is also capable of being analysed by Spotipy. Despite the search rendering fewer
results, the tweets returned all contained the unique track-id in the URL as well as most having text related
to either the song or an event a user has experienced. This removes the need to develop a system that
identifies songs through text which would be complex and increase the processing time, deviating from the
non-functional requirement e.

Parameterising the search by language was also important as IBM Watson has only been validated using
an English corpus. Despite being able to pass a range of languages, this has not been assessed for accuracy
and would create an inconsistent validity if multiple languages were used. An insight from (Alshaabi et al.,
2021) shows that of the languages tweeted on Twitter, English is widely used with approximately 35% as
of late 2020. Only a third of tweets written in English, the population in the applicable scope is narrowed
even further.

5.1.2 Data Collection

It was necessary to collect live data from anomalous users on Twitter as the structure of this project is
based on the hypothesis that emotions influence the choice in music and vice versa. Synthesising extra data
using deep learning would form a strong bias in the correlation between music features and emotion in text
by the model used to generate data. An aim of this project is to prove to what degree this hypothesis is true,
by measuring the validity of predictions for real-data.

The data requires a sizeable number of Spotify Tweets (>30). This was to ensure that there would be a
considerable number of data points across all four types of emotion, increasing the strength of potential
correlations. The data also had to be based on several users so that there is a range of different perspectives
on how music impacts an individual. The method used to gain user data was a continuous trawling process,
analysing data and making records with CSV files.

20
Upon implementation, only a small number of accounts were returned matching the search parameters.
This is partially due to the functionality of how the Twitter API searches for tweets. The initial pool of
tweets used to search for specific queries are filtered by time (of the last post made) and only contain
tweets made within the last 30 days (Twitter, 2022). As well as this, a request from a single query can only
contain up to 2,048 characters, further limiting the number of tweets. This posed a challenge when
collecting data as the number of requests needed was high, as only a few results were fetched. Due to the
request limit (120) in set period of time (15 minutes), this prevented the ability to search for data
autonomously and recursively. To overcome this problem, searching for tweets was a manual process. This
challenge was faced early in the project as live data extracted from Twitter was constantly growing with
nearly 6,000 tweets being made per second (David Sayce, 2019), making it difficult to track dynamic data.

Initially when trawling for data, a sizeable proportion of searches returned no tweets. A pattern observed
was that many Spotify Tweets made by users recently were some of the first Spotify Tweets that the user
has made. Upon further observation, the majority of the accounts which had a sparse number of Spotify
Tweets were relatively new accounts. This would mean that the Spotify Tweet made was one of the first
posts made, leading to few records for the user. A filtering process was implemented aiming to reduce the
number of users with unsubstantial Spotify Tweets, which improved the processing time required during
each search iteration as shown in Figure 2.2. Another step to decrease the number of invalid users, as
explained in 4.1.1 Querying, was adding a blacklist to prevent common bot and advertising accounts from
being gathered.

5.1.3 Graphing

Limitations Visualising High Dimensional Data

As there are 11 features of music being analysed simultaneously for each track, it can be hard to visualise
the data collected accurately. Visualising data beyond four dimensions becomes tough to interpret and spot
patterns. This limits the visualisation available during the experimental stage however, other methods such
as confusion and correlation matrices can numerically help identify patterns over multiple dimensions.
Even though it is difficult to clearly visualise all the data, it is still possible to view some aspects by
selecting attributes to visualise. Each dimension of data has a different variance and correlation with
another, making graphing a useful tool to confirm predictions by observation.

21
Scatter Graphs

Graphing was used throughout the beginning to mid-section of the project to clearly visualise the data
collected, spotting patterns and trends.

Data initially gained was the Spotify tracks where song features were extracted from. Three playlists
created independently by classification of their perceived emotional trait, joy, sadness, and anger (Figure
3.1.1) were used to see if there were any clear features that could separate generic emotion-based songs.
This would show if there were any visible clusters/patterns which can be learned by machine learning
models. As shown in Figure 3.1.2, The three playlists are clearly distinguishable based on the features of
valence, energy and speechiness. Whilst the joy and sadness playlist have a denser mean, the data points
for anger are spread further and a small intersection of each playlist is present nearer the centre of all three
playlists. This gave a strong grounding for the hypothesis that songs portraying different emotions differ in
music features.

Once the functionality to extract the sentiment from text was completed, I was able to add a fourth
dimensions to the pre-existing graph. Figure 3.1.3 shows a graph with the same axis as the previous graph,
but each point has an associated colour (from a heatmap), representing the intensity of sadness. There is a
visible trend of the intensity decreasing as the valence and energy increase. Two clusters can be viewed,
where valence is between 0 to 0.2 and then where valence is between 0.4 to 0.8. The first cluster is denser
than the majority which represents a high intensity of sadness in a specific area. As mentioned, it can be
difficult to view any clear correlation in four dimension space which led to creating a grid map of
interpolated data.

Interpolated Grid Map

Creating a grid of points by interpolating data, produced a clear view of the possible trends but, assumes a
cubic trend. This technique was primarily used to visualise a small number of data points to estimate if and
where there is a peak in intensity for each emotion. This method was not an entirely accurate
representation of the data, however it showed many interesting trends. Viewing Figure 3.2.1, there is a
clear association between joy and anger. These two emotions have been calculated independently of each
other and show that the trough in the centre of joy nearly matches the location and shape of the peaks with
high intensity for anger. The same pattern can be spotted between joy and sadness which are considered as
being opposites (where valence is 0.8 and energy is 0.6). Another user shares this pattern for the same
emotions, as shown in Figure 3.2.2 and Figure 3.2.3.

Another observation are emotions overlapping. Certain types of music show a moderate to high intensity
of more than one emotion, which highlights an element included in the UI (2.4 User Interface) of text
explaining the predicted emotions. Upon reflection, there are too many possibilities as to what the user
may be feeling as the values are continuous. Viewing emotions as discrete values, combinations can be

22
described using single word, as shown in Figure 3.2.4, though it may be inaccurate being based on
predictions. As a result, I decided to remove this element to avoid false information.

Overall, by graphing two features, energy and valence which commonly show a correlation, it is clear that
with more features/dimensions, clearer clusters would likely be identified. This is key information in
understanding which model will be most appropriate given the patterns it is required to recognise.

5.1.4 Extracting Sentiment

Sentiment in Lyrics

When implementing the lyric scraping technique (4.4.2 Lyric Scraping), there were issues involving
interpreting and obtaining the lyrics. For the songs which could be analysed, there was sufficient text to
produce a well averaged label for each emotion however, this became unusable when there are songs that
cannot be analysed within the same set of data.

The most common issue was songs not having lyrics accessible on the same website. Not all songs
available on Spotify are popular, leading to many songs not being published on the website. Genius relies
heavily on community additions to publish data (including lyrics) for songs (Genius, 2017). As well as
this, not all song lyrics are written in English, and can contain a mix or be made up entirely of another
language, as explained in 5.1.1 Querying.

With not all songs being analysed, correlating sentiment of lyrics and messages of Spotify Tweets becomes
inaccurate as there is a bias towards songs that are labelled and recorded on Genius. A small sample of
songs that have been analysed does not form a reliable estimation which can reflect all songs the user has
and will listen to. Upon reflection, it would not be appropriate to implement this feature due to the lack of
clarity and accurate information given to the user.

5.2 Modelling
A large component of this project is identifying which model is most appropriate in order to fulfil the
requirements and provide a high level of accuracy as described in 4.3 Modelling.

23
5.2.1 Pre-Processing Data

The first step in refining the data is removing all records that measure 0 intensity for all four emotions. By
removing the records that represent an insignificant change to emotion, the density of ‘Good Quality Data’
will increase, allowing the model to only learn relevant information. It is highly likely that there will be 0
values for emotions where one in particular has a significant value, allowing each model to still learn to
label with 0. Removing these anomalies will also reduce bias of data being flooded with 0 values, which
can lead to poor learning if the majority of training samples are imbalanced as such.

A necessary step to manipulate data is standardising the values, ensuring all attributes have a mean of 0
and standard deviation of 1. This step needs to occur before performing PCA (Principal Component
Analysis) as attributes with higher variances are weighted more, subsequently dominating other features in
the objective function. This is especially true for models that calculate distances of points such as
clustering algorithms, where the points of a cluster may be spread unevenly due to the measurement of the
attribute.

x−u
z=
s

z – standardized value, x – feature value, u – feature mean, s – standard deviation of training samples

PCA is applied after standardising the data in order to reduce the number of dimensions which may be
causing noise in the data (Chahboun and Maaroufi, 2021). This feature reduction technique merges
attributes to represent multiple features in a single dimension using linear combinations. Even though there
may be a slight loss in information, reducing the features allow the learning process to be much easier,
only handling attributes that explain the variation in data. As PCA and standardisation are dependent on
the size and values of the data which is unique each user, these processes will need to be computed before
training a model upon each login. They are relatively inexpensive methods which greatly improve the
ability for models to learn.

For this project, PCA is set to preserve at least 95% of information retained in the data as followed in
(Kilitcioglu, 2018). The components calculated will differ from user to user, setting the number of
components to a constant will explain an inconsistent proportion of variation. Defining this heuristic will
obtain the minimum number of components representing relevant data, allowing the model to learn at a
high, consistent degree of accuracy. Ideally, the less components required, the faster the model will be able
to be trained, meeting non-functional requirements.

Figure 3.3 shows there are strong correlations amongst several attributes. PCA will be effective at
emphasising these trends using much fewer dimensions, often only requiring five to six principal
components from the 11 attributes.

24
5.2.2 Evaluating models

Data being obtained from arbitrary users on Twitter makes the accuracy more difficult to measure as the
range of data is limited by what is available on Twitter and usable for the system. Each model will be
evaluated by splitting the data into a training and testing set where a cross-validation score can then be
calculated. For classification algorithms, different metrics are calculated such as recall, precision, f1-score
and support. These can be calculated using the classification_report function.

• Recall: Mean average of class occurrence


• Precision: Classes correctly classified
• f1-score: Metric based on precision Recall and Precision
• Support: Number of occurrences of a class in the dataset

2×(Precision × Recall)
f 1 Score=
Precision+ Recall

5.2.3 Implementing Models

Each model must return the scalar transformation applied to data and the trained model itself. This is to
allow new data to be transformed so it can be interpreted correctly by the model. When PCA is applied, the
linear transformations also need to be returned so that the attributes for new data can be transformed. The
scikit-learn library has a range of predefined models (including many described in Design and
Architecture) and only requires a function to be applied to train.

Linear Regression

The first models implemented were basic regression models:

1. Linear regression
2. Lasso regression
3. Ridge regression

And a classification model:

4. Logistic regression

25
The algorithms above are designed to learn linear relationships in the data however, they differ by how the
objective function is defined. Linear regression only computes the ordinary least squares loss, with no
penalty for the choice of weights. This allows the model to place a larger weight relative to others if a
particular attribute has a strong high level of correlation, indicative of its significance. The other models
build on this foundation but include a regularisation component, ensuring the weights are more balanced,
forming a generalised model. The objective function for Lasso regression penalises the sum of all the
weights, encouraging some to tend to zero. Ridge regression extends this by penalising on the basis of the
sum of the square values of weights, applying a greater penalty on larger values (Xu, 2021).

Despite the addition of regularisation with other techniques, the RMSE (relative mean squared error)
between the three models have an unnoticeable difference (an average of 3×10-4 across all emotions,
Figure 3.4.1). This was an interesting find as it shows that generalising the model did not impact the
accuracy however, the predictions were very inaccurate themselves. The RSME for joy, the emotion with
the most variation, was measured at approximately 0.46, nearly half the range of the data. The R2 scores
reflect the performance of the models by explaining the dissimilarity in correlation between the model and
data. In many cases, the models achieved a negative R2 score, implying that the model had learnt a
correlation that opposes what is seen in the data.

Lasso Objective Function:


n p
1
∑ ( y− Xw )2+ α ∑|w j|
2n i=1 j=1

Ridge Objective Function:


n p

∑ ( y −Xw )2 +α ∑ w2j
i=1 j=1

When parameter tuning the Logistic regression model, it was evident that lbfs was the most consistently
accurate algorithm, using L2 regularisation, an appropriate choice given the data’s multicollinearity. The
labelled data had to be discretised by splitting the labels into two classes based on a threshold value. The
threshold was set to 0.5 which is the mean of the range data can be labelled as. The mean accuracy of the
model was poor, achieving scores within the range of 30% to 60% across all emotions. As this is a binary
classifier, the results prove the model has not learned any useful information, attaining similar results to
randomly predicting each class. Adjusting the threshold did not impact the accuracy due to the limited
range in values and inaccuracy of data being classified using a linear model. This was also evident from
linear regression models achieving a similar accuracy and extremely low R2 scores.

26
A classification approach that focuses on geometric properties rather than statistical was Kernel SVM
(Support Margin Classifier). During the classification report however, it was clear that classification
models would struggle with the given data due to a large class imbalance. For most emotions, the two
binary classes were close to a 1:3 split and on occasion only one class was present.

Due to this class imbalance, models tended to only predict the most common class, often achieving f1-
scores between 0.8 to 1.0 and 0.0 for the opposing class. The imbalance was also reflected in the confusion
matrices as the number of true positives were close to the number of samples in the popular class. The
model aims to achieve the highest accuracy and with a weak correlation of discretely labelled data, the
model will always be bias towards the popular class to ensure the greatest accuracy.

Evaluating the family of linear regression, it was clear that the data was not representative of this trend.
Classification models proved to not learn ‘enough’ from discretely labelling of data, resorting to exploring
regression models further. Reflecting on the visualisation of data during the graphing stage, it would be
appropriate to explore clustering models next.

KNN (K-Nearest Regressor)

Whilst researching KNN (K-nearest neighbours), I found a variation which exploits the properties of how
KNN works and adds an element of local interpolation to gain a continuous value, consequently making it
a regression function. This is a lazy learner approach computes the model when required to, rather than
learning a function from a pre-defined generalised function (Zhang and Zhou, 2007). This is a beneficial
characteristic as the data may change in which function it most accurately represents, as well as previously
demonstrating it does not follow a (simple/linear) general function. This is useful for small datasets as the
model only interprets the training data, making predictions more accurate assuming the noise is low
(Cheamanunkul and Freund, 2014).

Viewing the graphs shown in 5.1.3 Graphing, it is clear that there are several different locations where the
intensity for an emotion peaks, the clusters will help represent the areas of peaks and troughs according to
the appearance of data. As a result, a key component is choosing the value for k, how the weights are
calculated and the leaf size.

27
This process of hyperparameter tuning can be assisted using scikit-learns GridSearchCV function which
exhaustively searches and returns the best parameters inputted based on evaluating and statistically
comparing each combination. Another stage in tuning the model was applying an ensemble learning
technique, bagging (bootstrap aggregation), to improve the accuracy of the model. This process selects
random samples of data (with replacement) from the training data to train base models independently,
producing an average output rather than just from a single model. By using multiple models, overfitting
can be greatly reduced however, it can also underfit on small datasets as the samples taken would be too
small to represent anything meaningful. Data (Figure 3.4.2) clearly shows the improvement of using
bagging, this is only a slight improvement however, it is consistent and provides a higher degree of
accuracy for a small cost of time (0.327 seconds on average per model).

The model used the Minkowski (Euclidean) metric to measure distances rather than Manhattan as the data
is not of a high enough dimension to be affected by the ‘curse of dimensionality’ where the Euclidean
calculation over emphasises larger distances when squaring the value.

Decision Tree Regression

Decision trees, unlike the previous models, uses a set of binary policies to narrow down the target value,
increasing the confidence of an output. The model is trained to learn the order and parameters for each
decision required to arrive at the highest average confidence on all outputs from the training set.

This is an effective solution after performing PCA as the data emphasises the attributes with the highest
variance, (commonly the attributes with highest information gain) which the model rapidly learns
(Kilitcioglu, 2018). This is highly effective when combined with bagging for small datasets as the range of
outputs for each model can be large, taking the average of a large number is an effective way to reduce the
bias of having a small subsample of data. This was evident in application (Figure 3.4.3) where a low
RSME values were achieved, relative to prior models. The population size for outputs however, was
restricted, as there could only be a limited number of nodes, constraining the available labels as shown in
Figure 3.4.4. This differed for larger datasets where the tree split more, allowing more specific labels to be
applied (Figure 3.4.5).

The training time for decision trees grow exponentially with the increase of parameters types such as the
maximum depth and minimum leaf nodes. This incurred a larger training time due to the number of
parameters requiring tuning and the scale of the largest producible tree requiring more processing.

28
5.2.4 Evaluation

Justifying the appropriate model is dependent on evaluating two key factors, training time and accuracy.
The two models which were superior were K-Nearest Regressor and Decision Tree Regressor for different
reasons. During testing, it was clear that KNN slightly outperforms the decision tree and also trains a lot
quicker. The inaccuracy of the model on small datasets however, meant that KNN could not be universally
used and the decision tree model was the best possible solution.

A practical solution was to identify when performance of KNN surpassed the decision tree regressor and
implement a conditional statement to apply KNN when the dataset is over a certain size. Evaluating the
RSME over a range of dataset sizes (Figure 3.5), after roughly 50 samples, KNN started achieving better
results which informed the conditional statement:

if len(data_to_graph.index) < 50:


self.model, self.scalar, self.pca = DecisionTree(data_to_graph,
emotion).drive()
else:
self.model, self.scalar, self.pca = KNeighborRegressor(data_to_graph,
emotion).drive()
senti_prediction.py line 33-36

Despite applying the same model, there can be variations in predictions based on the training set. As the
training set varies, the linear combination calculated in the PCA step can vary, resulting in differing values
used to train the models. This effect is reduced when the data size increases however, the rate of volatility
is very low, requiring a much larger dataset.

With both these models in place, the mean RSME for joy, the emotion with the highest error is 0.26, which
is low considering the external factors impacting the uncertainty are not accounted for.

29
5.3 Web Application
Initial development of the web application required the client side and server side to be established. These
were made into two directories of the same level so that they will be able to operate on independent
addresses, client side on port 8080 and server side on port 5000.

The login page was quick to implement, however, as the Twitter data is acquired from arbitrary users, it
would not be possible to implement a working Twitter login function. Credentials for a user with numerous
Spotify Tweets are required, for the demonstration the data used would be read from a CSV file gained
from trawling twitter. The button for logging into Twitter remained a dummy however, the Spotify login
was implemented, enabling read and write permissions to a user’s account. Once the token was acquired by
authenticating the user, the dashboard will then be displayed, where the model can now be trained given
access to the necessary data. The access token is stored in the window storage, where it is able to be
cached, requiring the user to stay logged in on the same device if desired.

Once a user has logged in, they are directed straight to the dashboard where recent music is displayed
instantly (being a simple call function) and a message indicating the model is training. The loading
message then changes into a page of infographics, displaying a graph of recent emotions and the top song
for the emotions, joy, sadness and anger. Fear was not included as prior examples consistently show levels
of intensity to be relatively low as well as more difficult for the user to interpret.

5.3.1 Graphs

Data used to graph during the evaluation stage had to be converted from Pandas dataframe into JSON so
that it was able to be routed from the server to client side. The output labels were converted from floats to
percentages allowing graph axis values to be displayed bigger, making it clearer for the user. The API also
included a tooltip feature, allowing the user to view values for specific values with a mouse-hover. The
user can also zoom and pan the graph, adding a level of interaction to the application.

A bar graph is used for displaying the emotions of a single searched track as it only produces a single value
per emotion, making this type of graph most appropriate. The colour code for the bar chart and top track
card is consistent with the line graph, which has a key to clarify.

When searching for a track to be predicted, asynchronous functionality allows processing to be run and
successfully returned before triggering any changes/animations.

30
Chapter 6

Evaluation

The objectives for this project have been established through requirements elaborating on existing work
discussed in the 1.1 Background. The requirements were set with a goal to develop an application utilising
machine learning to predict emotions based on social media sentiment. A reflection on how these
requirements were met as well as analysing the data gained is the outline to this chapter as well as
concluding the project as a whole.

6.1 Requirements
The MoSCoW method helped make the delivery of the MVP time efficient and due to the architecture of
the application, additional features were easily added. This was proved after user-feedback where a popular
suggestion was to add functionality linking predictions back to Spotify by making a playlist of songs based
on emotion. This component was easily implemented and integrated into the application, highlighting the
versatility of the MVP. The development of the MVP also allowed me to explore the data gained and test
basic functionality to estimate outputs early on, which shaped the course of the project.

Functional Requirements

As the majority of the functional requirements were met, I will be focusing on the ones which were
adapted or unattained.

Requirement 2, as explained earlier, had no useful application for the demonstration as the details for a
twitter account with a large quantity of Spotify Tweets are unknown, meaning that logging in would not
provide sufficient data. Despite this, the implementation is very simple as the Twitter developer account is
already instantiated and only requires a basic Axios function call.

Training a model using data only from the given user decreases the records of data available compared to a
generic predictor that combines multiple users, but tailors the model to fit the unique patterns for each user.
Requirement 7 was achieved in this respect however, the accuracy of the predictions relies heavily on
‘Good Quality Data’.

31
The could have requirements were significantly more difficult to implement due to the nature of how the
system operates, being a unidirectional emotion predictor. Not using a database avoids the need for
account creation disrupting the ease of use, meant that requirements 14 and 15 were not possible to
accomplish as this data needs to be stored. Requirement 15 contradicts the functional requirement h., as the
user must confirm all the predicted values, demanding a large quantity of user input for the system to reach
the goal state.

Non-functional Requirements

Most of the non-functional requirements were also met, with requirement f’ not executed as intended,
through visual representation instead of text. There is a low level of reliability when narratively explaining
changes in emotion as there is little information obtained relative to the variables causing a change. Even
with this minor setback, the application performs as intended, a fast processing, insightful infographic
dashboard.

Overall, all must have and should have requirements were met, whilst adaptations and development of the
project restricted the ability to achieve the could have. Most of the non-functional requirements were met,
providing a dynamic, fast processing, visually clear and intuitive application, accessible on mobile and PC.
Both sets of requirements were able to be achieved via the use of two models, KNN and Decision Trees,
applied respectively to the size of the data set. The model performance is greatly affected by the range,
quantity and quality of data. Bias data will inevitably produce less accurate predictions.

6.2 Testing
6.2.1 Unit and Integration testing

Upon unit testing, most tests passed with common issues within a few subsystems. All of the functionality
passed normal and boundary tests, however, some erroneous data created errors. The majority of these
errors were run time errors caused by a lack of data received from APIs. An example of this was a
KeyError, where the expected data did not exist due to incomplete data being passed to the Spotify API
(Figure 4.1).

Integration testing ensures that the end-user experience is fluid, with all modules tested via user-based
application. As Object Orientated Programming is implemented in a large portion of the program,

32
integration and regression testing proved successful, reflecting the potential for further development this
application possesses.

6.2.2 User testing

A stage of user testing was required to validate the effectiveness of the application as well as
understanding the usability from a user. A small sample of 5 users tested the application and were asked to
provide feedback on the application. There were a range of responses on how the application was helpful,
such as “The application loads very fast”, “The colour scheme is clear and instinctive”, “The login process
is very simple”. This reinforced the achievement of non-functional requirements however, the common
improvement that was suggested was to integrate the application more with the users Spotify account. I
was able to include the functionality of the playlist creator, adding the top ten songs of a given emotion to
that specific emotion playlist, e.g. Moodify Joy would consist of the top songs for Joy and can be
created/appended within the website to the users Spotify account (Figure 4.2).

6.3 Conclusion
The objective of this project was to produce an application which helps predict user’s emotions based on
music they listen to and share on Twitter, tailoring predictions towards the individual. Whilst this
application may not be the most accurate in terms of identifying the correct levels of emotion, it certainly
provides a useful insight into how emotions vary and which songs deviate from the mean per emotion.
This will in no way replace an existing technology, but may create an opening for how machine learning
can be used in assisting self-governing processes aimed at wellbeing. Exploiting AI to recognise otherwise
unnoticed correlations in real-data is a step forwards in passive learning to help recognise a human factor
that is otherwise difficult to measure. In conclusion, Moodify has created an accessible application for a
large range of users which can effectively give a personalised insight into a user’s emotional wellbeing,
achieving the goal of this project.

33
References
Adam Hayes (2021) What Is a Confidence Interval? Available at:
https://www.investopedia.com/terms/c/confidenceinterval.asp (Accessed: 18 September 2021).

Alshaabi, T., Dewhurst, D.R., Minot, J.R., et al. (2021) The growing amplification of social media:
measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020. EPJ
Data Science, 10 (1): 15. doi:10.1140/epjds/s13688-021-00271-0.

Ankush Chavan (2020) Twitter data storage and processing. Available at: https://ankush-
chavan.medium.com/twitter-data-storage-and-processing-dd13fd0fdb30 (Accessed: 14 January 2022).

Chahboun, S. and Maaroufi, M. (2021) Principal Component Analysis and Machine Learning Approaches
for Photovoltaic Power Prediction: A Comparative Study. Applied Sciences, 11 (17): 7943.
doi:10.3390/app11177943.

Cheamanunkul, S. and Freund, Y. (2014) Improved kNN Rule for Small Training Sets. In 5 December
2014. doi:10.1109/ICMLA.2014.37.

David Sayce (2019) The Number of tweets per day in 2020. Available at: https://www.dsayce.com/social-
media/tweets-day/ (Accessed: 17 April 2022).

Decision Tree Regressor explained in depth (2019). Available at: https://gdcoder.com/decision-tree-


regressor-explained-in-depth/ (Accessed: 18 April 2022).

Digvijay Singh (2020) MVP: MoSCoW Prioritization & its advantages. Available at:
https://www.linkedin.com/pulse/mvp-moscow-prioritization-its-advantages-digvijay-singh/ (Accessed: 14
April 2022).

Genius (2017) Genius – How Genius Works. Available at: https://genius.com/Genius-how-genius-works-


annotated (Accessed: 12 April 2022).

Guo, X., Yin, Y., Dong, C., et al. (2008) “On the Class Imbalance Problem.” In 2008 Fourth International
Conference on Natural Computation. Jinan, Shandong, China, 2008. IEEE. pp. 192–201.
doi:10.1109/ICNC.2008.871.

Hudaib, A., Masadeh, R., Qasem, M.H., et al. (2018) Requirements Prioritization Techniques Comparison.
Modern Applied Science, 12 (2): 62. doi:10.5539/mas.v12n2p62.

iamheart (2017) Our 5 Core Emotions And How We Make Them So Complex. Available at:
https://www.iamheart.ca/single-post/2017/07/07/the-5-basic-emotions (Accessed: 18 April 2022).

Jiménez Iglesias, L., Aguilar Paredes, C., Sánchez Gómez, L., et al. (2018) User experience and media.
The three click rule in newspapers’ webs for smartphones. 73rd ed. Revista Latina de Comunicación
Social. doi:10.4185/RLCS-2018-1271en.

Kilitcioglu, D. (2018) Why you should use PCA before Decision Trees. Available at:
https://dorukkilitcioglu.github.io/2018/08/11/pca-decision-tree.html (Accessed: 4 April 2022).

Lin, D. (2018) An Application for Automated Playlist Generation from Personal Music Libraries Using
Clustering Algorithms and Music Analysis., p. 36.

Michelle Miller and IBM (2019) The science behind the service | IBM Cloud Docs. Available at:
https://cloud.ibm.com/docs/tone-analyzer?topic=tone-analyzer-ssbts (Accessed: 16 April 2022).

34
Mikowski, M.S. and Powell, J.C. (2014) Single Page Web Applications., p. 26.

Pichl, M., Zangerle, E. and Specht, G. (2015) Combining Spotify and Twitter Data for Generating a
Recent and Public Dataset for Music Recommendation., p. 6.

Ronan, D., Reiss, J.D. and Gunes, H. (2018) An empirical approach to the relationship between emotion
and music production quality. arXiv:1803.11154 [cs, eess]. Available at: http://arxiv.org/abs/1803.11154
(Accessed: 16 April 2022).

Svirca, Z. (2020) Everything you need to know about MVC architecture. Available at:
https://towardsdatascience.com/everything-you-need-to-know-about-mvc-architecture-3c827930b4c1
(Accessed: 19 April 2022).

Țichindelean, M., Țichindelean, M.T., Cetină, I., et al. (2021) A Comparative Eye Tracking Study of
Usability—Towards Sustainable Web Design. Sustainability, 13 (18): 10415. doi:10.3390/su131810415.

Twitter (2022) Search API: Enterprise. Available at:


https://developer.twitter.com/en/docs/twitter-api/enterprise/search-api/overview (Accessed: 17 April
2022).

Xu, W. (2021) What’s the difference between Linear Regression, Lasso, Ridge, and ElasticNet? Available
at: https://towardsdatascience.com/whats-the-difference-between-linear-regression-lasso-ridge-and-
elasticnet-8f997c60cf29 (Accessed: 18 April 2022).

Zhang, M.-L. and Zhou, Z.-H. (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern
Recognition, 40 (7): 2038–2048. doi:10.1016/j.patcog.2006.12.019.

35
Appendix
Figure 1.1 - Spotify API Music Features

Name Type: Scale


1 danceability number <float>: 0 - 1
2 energy number<float>: 0 - 1
3 key integer: -1 - 11
4 loudness number<float>: -60 - 0
5 mode integer: 1 / 0
6 speechiness number<float>: 0 - 1
7 acousticness number<float>: 0 - 1
8 instrumentalness number<float>: 0 - 1
9 liveness number<float>: 0 - 1
10 valence number<float>: 0 - 1
11 tempo number<float>: 0 - 500

Figure 1.2 - IBM Watson Tone Analyser

Name Type: Scale


1 Anger number<float>: 0 - 1
2 Fear number<float>: 0 - 1
3 Joy number<float>: 0 - 1
4 Sadness number<float>: 0 - 1
5 Analytical number<float>: 0 - 1
6 Confident number<float>: 0 - 1
7 Tentative number<float>: 0 - 1

36
Figure 2.1 - System Structure

A high level view of the system structure, showing relationships between entities

37
Figure 2.2 - System Flow

38
The blue shapes represent where the Tweepy API will be used, green for Spotify and grey for IBM Watson.

39
Figure 2.3 - Modelling

40
Figure 2.4 – UI Login

• A login button for Spotify and another button for Twitter that will redirect the user to the selected
login page, to authenticate user access.

Figure 2.5 – UI Dashboard

41
• The navigation bar will allow space for any additional future additions to be added, such as a link
to another view, or a button that executes a function. A feature added in the navigation bar will be
the logout button, where it will remain visible to the user.

• The vertical list of recently played songs will act as a viewing history of songs the user has
recently listened to, it will also be the songs predicted and displayed on a line graph. Each song
will have its name and artist displayed as well as the album cover for recognisability. The user will
be able to click a song and be redirected to the Spotify Web Player where they can view and play
the selected song. This will contain a heading stating ‘Recently Played’ to make it clear to the user
that these are the tracks that they have recently listened to.

• The graph of recently played songs will be displayed as a line graph. This will be placed at the top
of the page as it will be the element displaying the users most recent record of emotions. It will
span the page to include as many songs as possible whilst remaining clear.

• A small bar graph will display the sum of each emotion, allowing the user to easily compare the
total emotion predicted over their recent listening history. A bar graph is appropriate for
displaying the change in emotion over the total period, whereas the line graph shows more
incremental changes within the total recent listening period.

• A box beneath the graph will explain the overall change in emotion, providing a text-based
explanation for the pattern shown by the graph.

• Joy and sadness, the two predominant and opposing emotions, it can be useful for users to identify
tracks that have the strongest intensity of these two emotions. Using the colours yellow and blue
to represent joy and sadness respectively, it will be clear to the user which tracks are representing
which emotion. I also plan to use a common colour to represent each emotion, ensuring that there
is consistency throughout all visual components.

• An input box will allow the user to enter any song name and be shown the predicted values
instantly. This component gives the user a wider scope, allowing the model to apply what it has
learnt on unseen data.

• Another component is evaluating the emotion labelled to a song against the emotion shown in the
song’s lyrics. This could indicate how much lyrics of songs affect the emotion portrayed in the
Spotify Tweet text.

42
Scatter Graphs
Figure 3.1.1

Anger: Angry Playlist - https://open.spotify.com/playlist/3aBeWOxyVcFupF8sKMm2k7?


si=6c719d90df1442c0

Joy: ‘Mood Boosters:the Happy Playlist’ - https://open.spotify.com/playlist/0IAG5sPikOCo5nvyKJjCYo?


si=6bcf10c0e9f3462c

Sadness: Sad Playlist’ - https://open.spotify.com/playlist/4rFp8l9vekheKOpeJLVkar?


si=3499e0a21a0d4650

Figure 3.1.2

A 3-dimensional graph showing a joy playlist in yellow, anger playlist in red and sadness in blue

43
Figure 3.1.3

A 4-dimensional graph with a heatmap representing the intensity of the emotion sadness

44
Interpolation Grid Map
Figure 3.2.1

Four graphs showing the intensities for each emotion based on interpolated data using a user’s data

(The higher the intensity, the brighter the colour)

45
Figure 3.2.2

Two interpolation graphs showing the emotions of joy and sadness using a user’s data

Figure 3.2.3

Four interpolation graphs using a user’s data7

46
Figure 3.2.4

James, Emily St. “Chart: How Inside Out’s 5 Emotions Work Together to Make More Feelings.” Vox, June 29, 2015.
https://www.vox.com/2015/6/29/8860247/inside-out-emotions-graphic.

A matrix representing the combination of emotions verbally

47
Pre-Processing Data
Figure 3.3

A correlation matrix heatmap for all music attributes analysed by models

The 3 green circled values show the strongest correlation, (negative represents a strong decreasing
correlation)

48
Implementing Models
Figure 3.4.1

RSME Linear Regression Ridge Lasso


Anger 0.257774722 0.25762 0.257716
Fear 0.182000283 0.182041 0.182027
Sadness 0.455781711 0.45612 0.45662
Joy 0.234566852 0.235235 0.23564

R2 Linear Regression Ridge Lasso


Anger -0.354921148 -0.35329 -0.35308
Fear -0.106438898 -0.10694 -0.10672
Sadness -2.154803627 -2.15949 -2.15636
Joy -0.220407289 -0.22737 -0.22951
A table showing the average RSME and R2 values for linear, ridge and lasso regression on a dataset of
129 records

Figure 3.4.2

The RSME of KNN regressor and KNN regressor with bagging on data consisting of 37 samples

49
Figure 3.4.3

A report from a tuned decision tree regressor on a small dataset

50
Figure 3.4.4

A line graph showing the intensities of emotion for a ‘happy playlist’ from a decision tree regressor
model

Figure 3.4.5

A line graph showing the intensities of emotion for recently listened to music from a decision tree
regressor model

51
Evaluation
Figure 3.5

A line graph showing the change in RSME for KNN and decision tree regressors over a varying number of
samples

Testing
Figure 4.1

A failed test due to a KeyError

52
Final UI Display
Figure 4.2

Four images showing the login page, the loading page, the top half of the dashboard and bottom half of
the dashboard from the web application

53

You might also like