Professional Documents
Culture Documents
As we move ahead into the 2020s, an ever-increasing share of music consumption and
discovery is going to be mediated by AI-driven recommendation systems. Back in 2020, as
much as 62% of consumers rated across platforms like Spotify and YouTube among their top
sources of music discovery — and be sure that a healthy chunk of that discovery is going to
be mediated by recommender systems. On Spotify, for instance, over one third of all new
artist discoveries happen through "Made for You" recommendation sessions according to the
recently released Made to be Found report.
Yet, as algorithmic recommendations take center stage in the music discovery landscape, the
professional community at large still perceives these recommender algorithms as black boxes.
Music professionals rely on recommender systems across platforms like Spotify and
YouTube to amplify the ad budgets, connect with the new audiences, and all-around execute
successful release campaigns — while often having no clear vision of how these systems
operate and how to leverage them to amplify artist discovery.
1. Content-based filtering : aiming to describe the track by examining the content itself
2. Collaborative filtering : aiming to describe the track in its connection with other
tracks on the platform by studying user-generated assets
The recommendation engine needs data generated by both methods to get a holistic view of
the content on the platform and solve the cold start problems when dealing with newly
uploaded tracks. First, let's take a look at the content-based filtering algorithms:
Analyzing artist-sourced metadata
As soon as Spotify ingests the new track, an algorithm will analyze all the general song
metadata provided by the distributor and metadata specific to Spotify (sourced through the
Spotify for Artist pitch form). In the ideal scenario, where all the metadata is filled correctly
and makes its way to the Spotify database, this list should include:
Track title
Release title
Artist name
Featured artists
Songwriter credits
Producers credits
Label
Release Date
Genre & sub-genre tags
Music culture tags
Mood tags
Style tags
Primary language
Instruments used throughout recording
Track typology (Is it a cover? Is it a remix? Is it an instrumental?)
Artist hometown/local market
The artist-sourced metadata is then passed downstream, as input into other content-based
models and the recommender system itself.
1. Lyrics analysis : The primary goal here is to establish the prominent themes and the
general meaning of the song's lyrics while also scanning for potential "clues" that
might be useful down the road, such as locations, brands, or people mentioned
throughout the text.
2. Web-crawled data : (focusing primarily on music blogs and online media outlets).
Running NLP models against web-crawled data allows Spotify to uncover how people
(and gatekeepers) describe music online by analyzing the terms and adjectives that
have the most co-occurrence with the song's title or the artist's name.
3. User-generated playlists : The NLP algorithms run against the user-generated
playlists featuring the track on Spotify to uncover additional insights into the song's
mood, style, and genre. "If the song appears on a lot of playlists with "sad" in the title,
it is a sad song."
The NLP models allow Spotify to tap into the track's cultural context and expand on the
sonic analysis of how the song sounds with a social dimension of how the song is perceived.
The three components outlined above — artist-sourced metadata, audio analysis, and
NLP models — make up the content-based approach of the track representation within
Spotify's recommender system. Yet, there's one more key ingredient to Spotify's recipe for
track representation
Collaborative Filtering
In many ways, collaborative filtering has become synonymous with Spotify's recommender
system. The DSP giant has pioneered the application of this so-called "Netflix approach" in
context of music recommendation — and widely publicized collaborative filtering as the
driving power behind its recommendation engine. So the chances are, you've heard the
process laid out time and again. At least the following version of it:
"We can understand songs to recommend to a user by looking at what other users with
similar tastes are listening to." The algorithm simply compares users' listening history: if user
A has enjoyed songs X, Y and Z, and user B has enjoyed songs X and Y (but haven't heard Z
yet), we should recommend song Z to them. By maintaining a massive user-item interaction
matrix covering all users and tracks on the platform, Spotify can tell if two songs are similar
(if similar users listen to them) and if two users are similar (if they listen to the same songs).
However, this item-user matrix approach comes with a host of issues that have to do with
accuracy, scalability, speed, and cold start problems. So, in recent years, Spotify has moved
away from consumption-based filtering — or at least drastically downplayed its role in track
representation. Instead, the current version of collaborative filtering focuses on the track's
organizational similarity: i.e., "two songs are similar if a user puts them on the same playlist".
By studying playlist and listening session co-occurrence, collaborative filtering algorithms
access a deeper level of detail and capture well-defined user signals. Simply put, streaming
users often have pretty broad and diverse listening profiles — in fact, building listening
diversity is one of Spotify's priorities
If, on the other hand, a lot of users put song A and song B on the same playlist, that is a
much more conclusive sign that these two songs have something in common. On top of that,
the playlist-centric approach also offers insight into the context in which these two songs are
similar — and with playlist creation being one of the most widespread practices on the
platform, Spotify has no shortage of collaborative filtering data to work through.
Today, the Spotify collaborative filtering model is trained on a sample of ~700 million user-
generated playlists selected out of the much broader set of all user-generated playlists on the
platform.
Now, we finally arrived at the point where the combination of collaborative and content-
based approaches allows Spotify's recommender system to develop a holistic representation
of the track. At this point, the track profile is further enriched by combining the outputs of
several independent algorithms to generate higher-level vectors (think of these as mood,
genre, style tags, etc.).
In addition, to deal with the cold start problem when processing freshly uploaded releases
that don't have enough NLP/playlist data behind them, some of these properties are also
extrapolated to develop overarching artist algorithmic profiles.
Explicit, or active feedback : library saves, playlist adds, shares, skips, click-through
to artist/album page, artist follows, "downstream" plays
Implicit, or passive feedback : listening sessions length, track playthrough, and
repeat listens
In the case of the Spotify recommender system, explicit feedback weighs in more when
developing user profiles. Music is often enjoyed as off-screen content, meaning that
uninterrupted consumption doesn't always relate to enjoyment. Then, user feedback data is
processed to develop the user profile, defined in terms of:
User-entity affinity: "How much does user X like artist A or track B? What are the
favorite artists/tracks of user Y?"
Item similarity: "How similar are artist A & artist B? What are the 10 tracks most
similar to track C?"
Item clustering: "How would we split these 50 tracks/artists into separate groups?"
The feature-specific algorithms can then tap into these unified models to generate
recommendations optimized for a given consumption space/context. For instance, the
algorithm behind Your Time Capsule playlists would primarily engage with user-entity
affinity data to try and find the tracks that users love but haven't listened to in a while. On the
other hand, Discover Weekly algorithms would employ a mix of affinity and similarity data
to find tracks similar to the user's preferences, which they haven't heard yet. Finally,
generating Your Daily Mix playlists would involve all three methods — first, clustering the
user's preferences into several groups and then expanding these lists with similar tracks.
What is BaRT?
However, on the flipside, exploitation mode flops when there is no data for the algorithm to
act on, meaning the user has not interacted enough with the Spotify app. The exploitation
mode is always at the mercy of the user.
When the exploitation mode fails, the exploration mode comes into action. The designers of
the BaRT algorithm anticipated such situations. Exploration mode does not require data from
the user to function. In fact, it blossoms in uncertainty.
Simply put, the algorithm gives a new song a chance to shine even though it has not garnered
enough user data about it. When the BaRT algorithm gives a new song exposure, it monitors
and records how users interacted with the record.
When users listen to a song for less than 30 seconds, it hurts their chances of getting more
exposure. On the other hand, when users listen for more than 30 seconds and interact with
the song further via playlisting, liking or saving, the song gets more exposure.
The BaRT algorithm considers the negative and positive recommendations before choosing
what songs to recommend to more users.
The exploration mode is useful when there is low certainty on the relevance of a record while
the reverse is true for the exploitation mode. Yes, the BaRT model can make mistakes but
the beauty of the algorithm is that it learns from its mistakes and predicts better next time. It
gauges consumer satisfaction using click-through rates and positive feedback.
Through reinforcement learning, BaRT is able to log, learn and adjust using its experience
with millions of Spotify users. To provide optimal satisfaction and correctly recommend
songs to users, the BaRT model utilizes a multi-armed bandit (which is programmed to
initiate a specific action ‘A’ in anticipation of the best reward ‘R’. Therefore, future action
‘A’ is reliant on former actions and rewards.
A Multi-armed Bandit (MAB) is tasked to pick actions that optimize the overall sum of
rewards. The first type of MAB did not have an eye for the context of the task such as device,
playlist, time of the day, user features among others. To better serve the listeners, the Spotify
team came up with a better version which is also known as the contextual multi-armed bandit.
This version collects and considers contextual data before picking the appropriate action.
Four determinants are responsible for the success or failure of a contextual MAB,
namely the context, the training procedure, the reward model, and the exploitation-
exploration policy. The report quoted earlier offers a formula for getting any of the three
parameters that are contained within the reward mode. The three parameters are an item j
(the record), an explanation e (why the song was chosen), and the context x. The formula
below houses the three parameters.
θ refers to the coefficients of the logistic regression and 1_i denotes a one-hot vector of
zeros with a single 1 at index.
Currently, the reward function influences the action. In a specific context x the user u
initiates the maximal action using the equation below
The authors in the report we quoted earlier used the epsilon greedy for the exploration
approach. According to them, “[This “gives equal probability mass to all non-optimal items
in the validity set f (e,x) and (1−ε) additional mass to the optimal action (j∗ ,e∗ ).” [2] The
policy is set to “either exploit or explore the item and explanation simultaneously”.
The procedure by which the algorithm enlightens itself is not out of the ordinary as BaRT
educates itself periodically in batch mode. The problem that arises when the algorithm
encounters a new artist with no prior songs to learn from is called the cold start problem.
Collaborative filtering is not the best method for new and relatively unknown songs.
The goals and rewards of Spotify recommendation algorithms
Now, as we mentioned in the beginning of this breakdown, the overarching goal of the
Spotify recommender system has to do primarily with retention, time spent on the platform,
and general user satisfaction. However, these top-level goals are way too broad to devise a
balanced reward system for ML algorithms serving content recommendations across a
variety of features and contexts — and so the definition of success for the algorithms will
largely depend on where and why the user engages with the system.
For instance, the success of the autoplay queue features is defined mainly in terms of user
engagement — explicit/implicit feedback of listen-through and skip rates, library and playlist
saves, click-through to the artist profile and/or album, shares, and so on. In the case of
Release Radar playlists, however, the set of rewards would be widely different, as users
would often skim through the playlist rather than listen to it from cover to cover. So, instead
of studying engagement with content, the algorithms would optimize for long-term feature
retention and feature-specific behavior. "Users are satisfied with the feature if they keep
coming back to it every week; users are satisfied with Release Radar if they save tracks to
their playlists or libraries."
Finally, in some cases, Spotify would employ yet another set of algorithms just to devise the
reward functions for a specific feature. For example, Spotify has trained a separate ML
model to predict user satisfaction with Discover Weekly (with the training set sourced by
user surveys). This model would look at the entire wealth of user interaction data, user past
Discover Weekly behavior, and user goal clusters (i.e., if the user engaged with Discover
Weekly as a background, to search for new music, save music or later, etc.) — and then
produce a unified satisfaction metric based on all that activity.
References
https://www.loudlab.org/blog/understanding-how-spotify-algorithm-works/
https://www.music-tomorrow.com/blog/how-spotify-recommendation-system-works-a-
complete-guide-
2022#:~:text=%22We%20can%20understand%20songs%20to,recommend%20song%20
Z%20to%20them.
Chatgpt