Professional Documents
Culture Documents
25 l
Guidelines for Search - Music Keyboard
65 tia
(Side by Side) Pilot
Table of Contents
Guidelines for Search - Music Keyboard (Side by Side) Pilot ..................................... 1
37 n
Music Keyboard Side by Side Guidelines ...................................................................... 3
Introduction ........................................................................................................................ 3
What is Apple Music? ................................................................................................................................................. 3
e
What is a side-preference rating task? .................................................................................................................... 3
4. Duplicates............................................................................................................................................................... 11
Duplicates Examples............................................................................................................................................ 11
25 l
Search Factors to Consider ..................................................................................................................................... 18
65 tia
Mandatory Comment ................................................................................................................................................ 19
Examples .......................................................................................................................... 21
Neutral ......................................................................................................................................................................... 21
37 n
Left Much Better or Right Much Better................................................................................................................... 25
e
14 id
nf
Co
Updated:
- April 4th, 2023: adjusted Steps to complete the side-preference task and added Decision Tree v2.
Added more info on unavailable results and sets of results returning different number of results
(under 2. Relevance)
25 l
Music Keyboard Side by Side Guidelines
65 tia
Introduction
In this document, we explain the side-preference rating guidelines for Apple Music Search. You will
use the BaseLine tool to make these ratings.
37 n
Apple Music is a music and video streaming service developed by Apple. Users select music to
stream to their device on-demand, or they can listen to existing, curated playlists. The service also
includes the Internet radio station Apple Music 1 (previously known as Beats 1), which broadcasts
e
live to over 100 countries 24 hours a day. Some highlights of what Apple Music offers to the users:
• Stream 100 million songs ad-free.
• Download your favorite tracks. Play them offline.
• Get exclusive and original content.
14 id
• Listen across all of your devices.
• Sing along, tap ahead, or just listen with lyrics view.
• Listen live to local radio stations from around the world.
• Discover songs you’ll love from music picked just for you.
• Tap into new music with curated playlists from our editors.
nf
positioned below the search query side by side for easy comparison.
25 l
65 tia
37 ne
Task Example
14 id
The importance of your work as a Rater
The data we receive from you in the form of high quality side-preference judgments will be used to
build and improve artificial intelligence systems such as search algorithms and machine learned
rankers that power the user experience for Apple Music users.
nf
Our ultimate goal is to surprise and delight our customers by improving search quality and
enhancing customer satisfaction, and you play an important role in this.
Ask yourself “Which set of results best satisfies the intent of the query? Which side would users
prefer?”
Your attention to detail, research and language skills, as well as your cultural knowledge of the
market are all critical to the success of our projects.
Co
Mandatory Comment
Each rating must be explained in the comment box. Even if “optional” is indicated, rating comments
are always mandatory.
Use your comment to add insights regarding your rating, factor chosen and the two sets of results.
This is extremely valuable especially when your side preference decision is 'Neutral': is the user
experience mutually bad for both sets of results? Mutually good?
The comment should be concise and must only explain why the rating was chosen. Comments will
be used as qualitative data in this evaluation.
25 l
4. It doesn't contain duplicate results.
5. The set of results is diverse.
65 tia
Examples
37 ne
14 id
nf
[beyoncé]
Co
[pop]
[good for you]
Co
nf
14 id
37 ne
65 tia
25 l
1. Query Intent
Primary and Secondary Intent
25 l
To help you understand the likely intents of the inputs, use your local market knowledge in addition
to online sources such as Google, YouTube, and social media. Please consider how you would
65 tia
search for content as a user of a music streaming platform to navigate to a specific content, or as a
means to browse a larger catalogue of music content.
The primary intent of a query is the most likely intent based on relevance, recency, quality and
popularity in the market you’re evaluating i.e. the intent of most users who enter that query on
Apple Music.
A secondary intent, on the other hand, is less likely, or would be a less popular intent compared
to a primary one. A secondary intent could be:
37 n
• content relevant to a smaller group of users
• lower quality/lesser known content such as unpopular covers, remixes from unknown artists.
e
Types of Queries
Navigational
14 id
Query that points to a specific piece of content. The user is using the search to find a particular
music content. For these type of queries, the job of the search is to return the intended content in
the top position.
Note: when the intent of the query is for an artist/band/composer, the Artist Page of the intended
artist is the content that best satisfies the user intent.
nf
Even if the user is looking for a specific piece of content, the navigational search queries might
still have some ambiguity and potentially point to multiple possible intents. For these cases, the
search results returned should include all popular possible intents ordered by popularity and recency
(intent that most likely will satisfy the intent of the majority of the users ranked higher than possible
secondary intents).
Co
Examples
[Navigational Intent]
Functional
Queries that are broad and not pointing to a specific piece of content. The user is not looking for a
particular music content but looking for music in a particular genre or that fits a particular mood or
activity. Functional queries might be also used to discover new music or new content on Apple
Music.
25 l
For these type of queries, the search should return a set of results that is relevant to the user
intent. Relevant multi-songs container types like category pages, playlists, stations and high
65 tia
quality albums from various artists are the preferred music results for functional queries. Songs or
results that are too specific like albums or playlist containing songs from a single artist are not
ideal results even if they fit the intended genre/mood/activity.
Category Pages are Apple Curated pages that contain multiple playlists that have a common theme
(you can find a list of category pages here). This container type is the best results to returned for
functional queries because it allows the user to explore and easily find playlists in the
genre/mood/activity that they search for.
37 ne
[Category Pages Examples]
14 id
Examples
nf
Co
[Functional Intent]
Ambiguous
Queries for which the intent is unclear or that doesn’t point to any content. The user intent is too
ambiguous to identify a primary or secondary intent.
For this type of queries, the search should return it is expected that the search results will
be relevant to the search query, often determined through text matching.
Examples
25 l
65 tia
[Ambiguous Intent]
2. Relevance
Relevance captures the relationship between the search query and the results returned for that
search. Therefore, a result is relevant when it has a connection with the search query and/or
the user intent.
37 n
This connection could be easy to spot thanks to text matching between the search query and
results. However, when assessing the relevance of the results, high emphasis should be given not
only to the literal text of the query but also to the user intent. There could be cases where this
e
connection is less evident but still important. For example, for the queries “gym” or “fitness”, the
playlists “Pure Workout” or “Hip-Hop Workout” are highly relevant since they are very likely to satisfy
the intent of the user.
14 id
Content Related to the Primary Intent (or popular Secondary Intents)
Especially for navigational queries, it’s important to take in consideration that the search not only will
return content that satisfies the primary intent (or popular secondary intents) of the query. Some of
the content returned can be content that is related to the primary intent (or popular secondary
intents) which makes it still relevant to the query. For related content, stronger is the connection
nf
between the related content and the intended content, stronger is the relevancy to the search
query. For example,
• For the query [you proof], the user is looking for the song ‘You Proof’ by Morgan Wallen. The
Morgan Wallen artist page or the album ’One Thing At A Time‘ that contains the intended
song are results that are highly related to the primary intent of the user.
• For the query [nba], the user is looking for the artist 'YoungBoy Never Broke Again'. Popular
songs or albums and the intended artist's essential playlist are highly related results for this
Co
search query.
Playlists
For Navigational queries, especially queries looking for a song or an artist, playlists are returned to
enhance the search experience of the user by providing a multi-songs container with content that is
related to the user intent. When it comes to relevancy, it’s important to distinguish a highly
related playlist from a lesser relevant playlist.
• High relevance playlists - High-quality playlists that feature the intended song AND it
contains only other songs by the intended song’s artist OR High-quality playlists that feature
only songs by artist implied by the query.
o For example , for the search query [you proof], the Morgan Wallen Essential
playlist is a highly relevant playlist since it is an high-quality playlists that feature
the intended song and it contains only other songs by Morgan Wallen.
• Mid relevance playlists - High-quality playlists that feature the intended song/artist AND it
contains other songs in the same genre.
o For example, for the search query [migos], the playlist ‘Rap Life 2021’ is a playlist
with mid relevancy since it is an high-quality playlist that feature the intended artist
and it contains other songs in the same genre.
• Low relevance playlists - Playlists that don’t feature the intended song/artist BUT fit in the
genre of the intended song OR playlists that do feature the intended song/artist BUT the
25 l
other songs are different in genre.
o For example, for the search query [pon de replay], the playlist ‘30th Birthday Party’
65 tia
is a low relevancy playlist because it does feature the intended song but the other
songs are different in genre.
37 n
Therefore, if on one side we have more results but some are irrelevant results, returning less but
relevant results is better. If on one side we have more results and all relevant (especially related to
the primary or secondary intent), more results is better.
e
Content uploaded by artist vs. content from third-party compilations
Returning content uploaded by the artist is preferred since it will contain the official thumbnail image,
it might have higher quality and can easily leads the user to the official artist/album page. Therefore,
14 id
in terms of relevancy, songs that are uploaded by the artist (from official albums/singles) are
more relevant that songs than third-party compilations and Various Artist albums.
Example
For the search query [ed sheeran], the song below on the left, ‘Perfect’ from the album ÷ (Deluxe) by
Ed Sheeran (the official artist), is more relevant than the song below on the right, ‘Perfect’ from
nf
Unavailable Content
If a result is unavailable on Apple Music, the availability issue should not be penalized if relevance
and popularity can be determined from other sources such as Google, Spotify, Deezer, etc.
3. Ranking
Ranking is the process during which all search results recalled for the search query are sorted based
on query relevancy, popularity and recency.
25 l
The job of the search is to return the result(s) that is most likely to satisfy the primary intent of the
user in the first position(s). The higher the result(s) that satisfies the user intent is, the easier it is
65 tia
for the user to find it, which translates to a better user experience.
For the content associated with the intended result, it is important that results that have higher
relevancy are ranked higher than other less relevant results.
For example:
• If the user is looking for a specific song, the artist page of the intended song’s artist and the
album containing the intended song should be ranked higher than other content by the artist
and playlists with mid and low relevancy
37 n
• If the user is looking for a specific artist, popular/recent songs, albums and high relevancy
e playlists (like the artist’s essential playlist) should be ranked higher than less popular/less
recent songs and albums by the intended artist and playlists with mid and low relevancy.
4. Duplicates
It is possible that the set of results contains duplicate songs. Two (or more) songs are duplicates of
each other when they are audio-equivalent (the audio of the tracks is identical).
14 id
Duplicates songs might have a different thumbnail image and still have the same audio, therefore
still consider these as duplicates.
Sometimes you will encounter singles in album container. These albums should be considered
equivalent to a song container since they only contain one song. If a song contained in the Single
nf
Album has the same audio of another song in the search result set, they are considered as
duplicates.
Another case of common duplicates is for the explicit version vs. the clean version of the same song.
The search should provide only the explicit version of the song (which is the original version of the
song), unless the user specifies in the query that they are looking for the clean version (usually by
adding the word 'clean' at the end of the query).
Co
Duplicates Examples
1. These two songs have the same title, they are by the same artist and they are audio
equivalent (Audio Links - Song 1, Song 2). Therefore, these two songs are consider
duplicates.
25 l
65 tia
[Duplicates Results Example 2]
3. These two songs have the same title, they are by the same artists and they are audio equivalent
(Audio Links - Song 1, Song 2). The only difference is that one is explicit version (visible by the 'E'
next to the title) and one is the clean version. Therefore, these two songs are consider duplicates.
37 ne
[Duplicates Results Example 3]
2. These two songs are by the same primary artist but they are two different remixes of the same
song and they are not audio equivalent (Audio Links - Song 1, Song 2). Therefore, these two results
are not considered duplicates.
[Non Duplicates Example 2]
25 l
65 tia
5. Diversity in search results
An important feature of the Apple Music search is its ability to understand what the user is searching
for and to return a comprehensive set of results to explore and discover other content types related
to the primary intent of the search query.
This feature is important to enrich the search experience of the users and to showcase the Apple
Music catalog, especially Apple Music curated content.
37 n
A set of results is diverse when it contains a balanced representation of content types that
enables the exploration and discovery of content which is related to the user intent. The preferred
content types are based on the intent of the query and the query type (navigational or functional).
e
The right balance of content types depends on the search query and the intent of the query. Use
your judgement to determine if the diversity present in the set of results is adequate and encourages
the discovery and exploration of content related to the primary intent of the query.
14 id
Notes:
• Content types = Songs, Albums, Artist Pages, Playlist, Stations, Music Videos, Radio
Episodes, etc.
• Content types are easy visible next of each results in the BaseLine tool.
• Single albums should be considered equivalent to a song container since they only contain
one song.
nf
Displaying the primary intent (and popular secondary intents) takes precedence over
Co
diversity. On a ranking prospective, primary intent and popular secondary intents should be
displayed in higher positions than diversity content types.
Diversity or lack of diversity should not be penalized if the query have any of the following
characteristics:
• The intent is ambiguous or unclear
o ex. [let me], [fine],[we m]
• The primary intent is not for a specific song, artist, album/soundtrack
o ex. [a-list pop] - primary intent for Apple Music Curated A-List Pop, [rock 105.3] -
primary intent for broadcast station ROCK 105.3.
• A content type is specified in the search query
o ex. [punk radio], [90’s country essentials playlist]
25 l
The preferred diversity content types are based on the query intent:
65 tia
[Diversity Container Types Choices - Navigational]
37 n
Examples
e
14 id
nf
Co
[as it was]
25 l
65 tia
37 ne
[the weekn]
14 id
nf
Co
25 l
Diversity or lack of diversity should not be penalized if the query have any of the following
characteristics:
65 tia
• A content type is specified in the search query
o ex. [rock playlists], [workout station]
37 n
Examples
e
14 id
nf
Co
[jazz]
25 l
65 tia
37 ne
14 id
[sleep]
Side-Preference Task
nf
The main objective of the side-preference ratings is to determine which set of music results
provides a better search experience to the users for a given search query.
Your task is to indicate which set of results the users would prefer, and why by considering all the
music items returned in the two set of search results (up to 10 results each). Communicate your side
preference decision using the below rating scale
Co
Neutral
It is also possible that the search experience provided by the two sets of results are equally good or
bad or too similar to determine which set is better. In those cases, a neutral rating should be
given.
Left Slightly Better or Right Slightly Better should be chosen when there is not any major difference
25 l
between the two sets of results but only minor difference that still makes one side better than
the other, including:
65 tia
• Both side returning the primary intent of the user in the top position but one side returns an
irrelevant result in the lower positions.
• Both side returning the primary intent of the user in the top position but there is a minor
ranking changes present in the middle positions.
• One side returns the intended result in the top position and, on the other side, the intended
result is returned in the 3rd position.
37 n
Left Much Better or Right Much Better should be chosen when there is a significant different for at
least one of the factor listed above that makes one side much better or much worst than the other,
including:
• One side provides a set of results that is preferred for all of the search factors.
e
• One side returning a result that satisfies the primary intent of the user in top position(s) and
the other side is not returning the primary intent result and neither the secondary intent.
• Both side return the primary intent of the user but one side returns the intended result at
position 7 and the other side returns it at position 1.
14 id
• Both side return the primary intent of the user but one set of results is overall more relevant
and is better ranked.
• One side returns a result in the 2nd position which is completely irrelevant.
When considering which set of results provides a better user experience, please consider the
following factors:
1. Primary or popular secondary intents - Does the set of results contain the primary and
popular secondary intent of the query?
o The most important aspectof the search is to return a result(s) that satisfy the
intent of the user. Not returning the intended content in the set of results can result
Co
25 l
it is important for the set of results to have the most popular and closely related
content in the top positions, especially results that satisfy the intent of the user. A
65 tia
set of result that is well ranked, can help the user to find what they are looking for
quickly and without a lot of effort like scrolling.
4. Duplicates - Does the set of results contain duplicate results?
o The presence of duplicates is not ideal in a set of results because it can make it
more difficult for the users to find the content they are looking for or to browse and
discover are content. A set of result without duplicates delivers a cleaner, more
useful and more enjoyable user experience.
5. Diversity - Is the set of results diverse? Does it contain a balanced representation of
container types and that follows the diversity expectations? Do the diversity content types in
37 ne the set of results encourage discovery and exploration of content related to the primary
intent of the query?
o Diversity is important to enrich the search experience and to encourage the
exploration and discoveryof content which is related to the user intent. A
diversity set of results helps the user by showcasing the broad Apple Music
catalog, especially Apple Music curated content.
Note: The search factors above are listed in order of importance. E.g. an issue in relevance should
14 id
weight more on the side-preference decision than an issue with diversity.
Mandatory Comment
Each rating must be explained in the comment box. Even if “optional” is indicated, rating comments
are always mandatory.
nf
Use your comment to add insights regarding your rating, factor chosen and the two sets of results.
This is extremely valuable especially when your side preference decision is 'Neutral': is the user
experience mutually bad for both sets of results? Mutually good?
The comment should be concise and must only explain why the rating was chosen. Comments will
be used as qualitative data in this evaluation.
Co
1. Analyze the search query to understand the primary intent of the user.
- Is the search query navigational or functional?
- What is the user looking for?
2. Determine if the intent of the query is clear or if determining the intent is not possible.
25 l
of results encourage discovery and exploration of content related to the primary intent of the query?
5. Indicate your side preference using the rating scale.
65 tia
6. Indicate which factor impacted your side preference decision the most.
- If you have selected 'Neutral' as your side preference, use the option 'none'.
7. Leave comment explaining your choice.
37 n
content with title or lyrics fully matches the search query.
5. Indicate your side preference using the rating scale.
6. Indicate which factor impacted your side preference decision the most.
- If you have selected 'Neutral' as your side preference, use the option 'none'.
e
7. Leave comment explaining your choice.
When indicating your side-preference, you can use the below decision tree:
14 id
nf
Co
Decision Tree
Examples
Neutral
25 l
65 tia
37 ne
[Example 1 - di]
14 id
nf
Co
[Example 2 - classical]
25 l
65 tia
37 n
[Example 3 - baby hang over]
e
Left Slightly Better or Right Slightly Better
14 id
nf
Co
25 l
65 tia
37 ne
[Example 9 - you want me i want you baby]
14 id
nf
Co
[Example 12 - m83]