You are on page 1of 26

Please ensure that you are using the latest version of the guidelines from BaseLine, which are

found in the upper right corner of every task.

25 l
Guidelines for Search - Music Keyboard

65 tia
(Side by Side) Pilot
Table of Contents
Guidelines for Search - Music Keyboard (Side by Side) Pilot ..................................... 1

37 n
Music Keyboard Side by Side Guidelines ...................................................................... 3
Introduction ........................................................................................................................ 3
What is Apple Music? ................................................................................................................................................. 3
e
What is a side-preference rating task? .................................................................................................................... 3

The importance of your work as a Rater ................................................................................................................. 4


14 id
Mandatory Comment .................................................................................................................................................. 4

Characteristics of a Great Music Search Experience ................................................... 4


Examples ...................................................................................................................................................................... 5

1. Query Intent ............................................................................................................................................................. 7


nf

Primary and Secondary Intent .............................................................................................................................. 7

Types of Queries .................................................................................................................................................... 7


2. Relevance ................................................................................................................................................................ 9
Content Related to the Primary Intent (or popular Secondary Intents) .......................................................... 9
Co

Sets of results returning different number of results ....................................................................................... 10

Content uploaded by artist vs. content from third-party compilations .......................................................... 10

Unavailable Content ............................................................................................................................................. 10


3. Ranking................................................................................................................................................................... 11

4. Duplicates............................................................................................................................................................... 11
Duplicates Examples............................................................................................................................................ 11

Non Duplicates Songs ......................................................................................................................................... 12


5. Diversity in search results .................................................................................................................................... 13
Satisfying Intent of the User and Diversity ....................................................................................................... 13

Diversity for Navigational queries ...................................................................................................................... 13


Diversity for Functional queries .......................................................................................................................... 15

Side-Preference Task ...................................................................................................... 17


Side Preference Rating Scale ................................................................................................................................. 17

25 l
Search Factors to Consider ..................................................................................................................................... 18

65 tia
Mandatory Comment ................................................................................................................................................ 19

Steps to Complete the Side-Preference Task ...................................................................................................... 19

Examples .......................................................................................................................... 21
Neutral ......................................................................................................................................................................... 21

Left Slightly Better or Right Slightly Better ............................................................................................................ 22

37 n
Left Much Better or Right Much Better................................................................................................................... 25

e
14 id
nf
Co
Updated:
- April 4th, 2023: adjusted Steps to complete the side-preference task and added Decision Tree v2.
Added more info on unavailable results and sets of results returning different number of results
(under 2. Relevance)

25 l
Music Keyboard Side by Side Guidelines

65 tia
Introduction
In this document, we explain the side-preference rating guidelines for Apple Music Search. You will
use the BaseLine tool to make these ratings.

What is Apple Music?

37 n
Apple Music is a music and video streaming service developed by Apple. Users select music to
stream to their device on-demand, or they can listen to existing, curated playlists. The service also
includes the Internet radio station Apple Music 1 (previously known as Beats 1), which broadcasts
e
live to over 100 countries 24 hours a day. Some highlights of what Apple Music offers to the users:
• Stream 100 million songs ad-free.
• Download your favorite tracks. Play them offline.
• Get exclusive and original content.
14 id
• Listen across all of your devices.
• Sing along, tap ahead, or just listen with lyrics view.
• Listen live to local radio stations from around the world.
• Discover songs you’ll love from music picked just for you.
• Tap into new music with curated playlists from our editors.
nf

What is a side-preference rating task?


A side-preference rating task consists of evaluating two separate sets of search results (up to 10
results each) and indicating which set of results better satisfies the user intent of a given query. The
preferred side should be the set of results that gives the user the better search experience.
In these rating tasks, you will see a search query at the top of the window and the two sets of results
Co

positioned below the search query side by side for easy comparison.
25 l
65 tia
37 ne
Task Example
14 id
The importance of your work as a Rater
The data we receive from you in the form of high quality side-preference judgments will be used to
build and improve artificial intelligence systems such as search algorithms and machine learned
rankers that power the user experience for Apple Music users.
nf

Our ultimate goal is to surprise and delight our customers by improving search quality and
enhancing customer satisfaction, and you play an important role in this.
Ask yourself “Which set of results best satisfies the intent of the query? Which side would users
prefer?”
Your attention to detail, research and language skills, as well as your cultural knowledge of the
market are all critical to the success of our projects.
Co

Mandatory Comment
Each rating must be explained in the comment box. Even if “optional” is indicated, rating comments
are always mandatory.
Use your comment to add insights regarding your rating, factor chosen and the two sets of results.
This is extremely valuable especially when your side preference decision is 'Neutral': is the user
experience mutually bad for both sets of results? Mutually good?
The comment should be concise and must only explain why the rating was chosen. Comments will
be used as qualitative data in this evaluation.

Characteristics of a Great Music Search


Experience
A set of search results that provides a great search experience has these characteristics:
1. It includes a result (or results) that satisfies the primary intent and potential secondary
intent of the query.
2. All results are relevant to the search query and/or user intent.
3. The results are ranked based on query relevancy, popularity and recency.

25 l
4. It doesn't contain duplicate results.
5. The set of results is diverse.

65 tia
Examples

37 ne
14 id
nf

[beyoncé]
Co
[pop]
[good for you]
Co
nf
14 id
37 ne
65 tia
25 l
1. Query Intent
Primary and Secondary Intent

25 l
To help you understand the likely intents of the inputs, use your local market knowledge in addition
to online sources such as Google, YouTube, and social media. Please consider how you would

65 tia
search for content as a user of a music streaming platform to navigate to a specific content, or as a
means to browse a larger catalogue of music content.

The primary intent of a query is the most likely intent based on relevance, recency, quality and
popularity in the market you’re evaluating i.e. the intent of most users who enter that query on
Apple Music.

A secondary intent, on the other hand, is less likely, or would be a less popular intent compared
to a primary one. A secondary intent could be:

37 n
• content relevant to a smaller group of users
• lower quality/lesser known content such as unpopular covers, remixes from unknown artists.

e
Types of Queries
Navigational
14 id
Query that points to a specific piece of content. The user is using the search to find a particular
music content. For these type of queries, the job of the search is to return the intended content in
the top position.

Note: when the intent of the query is for an artist/band/composer, the Artist Page of the intended
artist is the content that best satisfies the user intent.
nf

Even if the user is looking for a specific piece of content, the navigational search queries might
still have some ambiguity and potentially point to multiple possible intents. For these cases, the
search results returned should include all popular possible intents ordered by popularity and recency
(intent that most likely will satisfy the intent of the majority of the users ranked higher than possible
secondary intents).
Co

Examples

[Navigational Intent]

Functional
Queries that are broad and not pointing to a specific piece of content. The user is not looking for a
particular music content but looking for music in a particular genre or that fits a particular mood or
activity. Functional queries might be also used to discover new music or new content on Apple
Music.

25 l
For these type of queries, the search should return a set of results that is relevant to the user
intent. Relevant multi-songs container types like category pages, playlists, stations and high

65 tia
quality albums from various artists are the preferred music results for functional queries. Songs or
results that are too specific like albums or playlist containing songs from a single artist are not
ideal results even if they fit the intended genre/mood/activity.

Category Pages are Apple Curated pages that contain multiple playlists that have a common theme
(you can find a list of category pages here). This container type is the best results to returned for
functional queries because it allows the user to explore and easily find playlists in the
genre/mood/activity that they search for.

37 ne
[Category Pages Examples]
14 id
Examples
nf
Co

[Functional Intent]

Ambiguous
Queries for which the intent is unclear or that doesn’t point to any content. The user intent is too
ambiguous to identify a primary or secondary intent.

For this type of queries, the search should return it is expected that the search results will
be relevant to the search query, often determined through text matching.
Examples

25 l
65 tia
[Ambiguous Intent]

2. Relevance
Relevance captures the relationship between the search query and the results returned for that
search. Therefore, a result is relevant when it has a connection with the search query and/or
the user intent.

37 n
This connection could be easy to spot thanks to text matching between the search query and
results. However, when assessing the relevance of the results, high emphasis should be given not
only to the literal text of the query but also to the user intent. There could be cases where this
e
connection is less evident but still important. For example, for the queries “gym” or “fitness”, the
playlists “Pure Workout” or “Hip-Hop Workout” are highly relevant since they are very likely to satisfy
the intent of the user.
14 id
Content Related to the Primary Intent (or popular Secondary Intents)
Especially for navigational queries, it’s important to take in consideration that the search not only will
return content that satisfies the primary intent (or popular secondary intents) of the query. Some of
the content returned can be content that is related to the primary intent (or popular secondary
intents) which makes it still relevant to the query. For related content, stronger is the connection
nf

between the related content and the intended content, stronger is the relevancy to the search
query. For example,
• For the query [you proof], the user is looking for the song ‘You Proof’ by Morgan Wallen. The
Morgan Wallen artist page or the album ’One Thing At A Time‘ that contains the intended
song are results that are highly related to the primary intent of the user.
• For the query [nba], the user is looking for the artist 'YoungBoy Never Broke Again'. Popular
songs or albums and the intended artist's essential playlist are highly related results for this
Co

search query.

Playlists
For Navigational queries, especially queries looking for a song or an artist, playlists are returned to
enhance the search experience of the user by providing a multi-songs container with content that is
related to the user intent. When it comes to relevancy, it’s important to distinguish a highly
related playlist from a lesser relevant playlist.

• High relevance playlists - High-quality playlists that feature the intended song AND it
contains only other songs by the intended song’s artist OR High-quality playlists that feature
only songs by artist implied by the query.
o For example , for the search query [you proof], the Morgan Wallen Essential
playlist is a highly relevant playlist since it is an high-quality playlists that feature
the intended song and it contains only other songs by Morgan Wallen.
• Mid relevance playlists - High-quality playlists that feature the intended song/artist AND it
contains other songs in the same genre.
o For example, for the search query [migos], the playlist ‘Rap Life 2021’ is a playlist
with mid relevancy since it is an high-quality playlist that feature the intended artist
and it contains other songs in the same genre.
• Low relevance playlists - Playlists that don’t feature the intended song/artist BUT fit in the
genre of the intended song OR playlists that do feature the intended song/artist BUT the

25 l
other songs are different in genre.
o For example, for the search query [pon de replay], the playlist ‘30th Birthday Party’

65 tia
is a low relevancy playlist because it does feature the intended song but the other
songs are different in genre.

Sets of results returning different number of results


It is possible that one side returns less results than the other side. For these cases, the main thing to
keep in consideration is that “It’s better to return nothing than return results that do not satisfy
or they are not related to the primary or secondary intent”.

37 n
Therefore, if on one side we have more results but some are irrelevant results, returning less but
relevant results is better. If on one side we have more results and all relevant (especially related to
the primary or secondary intent), more results is better.
e
Content uploaded by artist vs. content from third-party compilations
Returning content uploaded by the artist is preferred since it will contain the official thumbnail image,
it might have higher quality and can easily leads the user to the official artist/album page. Therefore,
14 id
in terms of relevancy, songs that are uploaded by the artist (from official albums/singles) are
more relevant that songs than third-party compilations and Various Artist albums.

Example
For the search query [ed sheeran], the song below on the left, ‘Perfect’ from the album ÷ (Deluxe) by
Ed Sheeran (the official artist), is more relevant than the song below on the right, ‘Perfect’ from
nf

album Internet Hits 2022 by Various Artist.


Co

Content uploaded by artist vs. content from third-party compilations

Unavailable Content
If a result is unavailable on Apple Music, the availability issue should not be penalized if relevance
and popularity can be determined from other sources such as Google, Spotify, Deezer, etc.
3. Ranking
Ranking is the process during which all search results recalled for the search query are sorted based
on query relevancy, popularity and recency.

25 l
The job of the search is to return the result(s) that is most likely to satisfy the primary intent of the
user in the first position(s). The higher the result(s) that satisfies the user intent is, the easier it is

65 tia
for the user to find it, which translates to a better user experience.

For the content associated with the intended result, it is important that results that have higher
relevancy are ranked higher than other less relevant results.
For example:
• If the user is looking for a specific song, the artist page of the intended song’s artist and the
album containing the intended song should be ranked higher than other content by the artist
and playlists with mid and low relevancy

37 n
• If the user is looking for a specific artist, popular/recent songs, albums and high relevancy

e playlists (like the artist’s essential playlist) should be ranked higher than less popular/less
recent songs and albums by the intended artist and playlists with mid and low relevancy.

4. Duplicates
It is possible that the set of results contains duplicate songs. Two (or more) songs are duplicates of
each other when they are audio-equivalent (the audio of the tracks is identical).
14 id
Duplicates songs might have a different thumbnail image and still have the same audio, therefore
still consider these as duplicates.

Sometimes you will encounter singles in album container. These albums should be considered
equivalent to a song container since they only contain one song. If a song contained in the Single
nf

Album has the same audio of another song in the search result set, they are considered as
duplicates.

Another case of common duplicates is for the explicit version vs. the clean version of the same song.
The search should provide only the explicit version of the song (which is the original version of the
song), unless the user specifies in the query that they are looking for the clean version (usually by
adding the word 'clean' at the end of the query).
Co

Duplicates Examples
1. These two songs have the same title, they are by the same artist and they are audio
equivalent (Audio Links - Song 1, Song 2). Therefore, these two songs are consider
duplicates.

[Duplicates Results Example 1]


2. The song and the song contained in the single album have the same title, they are by the same
artist and they are audio equivalent (Audio Links - Song, Single Album). Therefore, these two results
are considered duplicates.

25 l
65 tia
[Duplicates Results Example 2]

3. These two songs have the same title, they are by the same artists and they are audio equivalent
(Audio Links - Song 1, Song 2). The only difference is that one is explicit version (visible by the 'E'
next to the title) and one is the clean version. Therefore, these two songs are consider duplicates.

37 ne
[Duplicates Results Example 3]

Non Duplicates Songs


14 id
Different versions of a song like remixes, covers, live versions, instrumental versions, karaoke
versions or versions with featuring artists should not be considered duplicates of the original song
since the audio of the original and the different version would not be identical to each other.
nf

Non Duplicates Examples


1. Even though, these two songs have the same title and they are by the same artist, they are
different versions of the same song and they are not audio equivalent (Audio Links - Song
1 (original version), Song 2 (live version)). Therefore, these two results are not considered
duplicates.
Co

[Non Duplicates Example 1]

2. These two songs are by the same primary artist but they are two different remixes of the same
song and they are not audio equivalent (Audio Links - Song 1, Song 2). Therefore, these two results
are not considered duplicates.
[Non Duplicates Example 2]

25 l
65 tia
5. Diversity in search results
An important feature of the Apple Music search is its ability to understand what the user is searching
for and to return a comprehensive set of results to explore and discover other content types related
to the primary intent of the search query.
This feature is important to enrich the search experience of the users and to showcase the Apple
Music catalog, especially Apple Music curated content.

37 n
A set of results is diverse when it contains a balanced representation of content types that
enables the exploration and discovery of content which is related to the user intent. The preferred
content types are based on the intent of the query and the query type (navigational or functional).
e
The right balance of content types depends on the search query and the intent of the query. Use
your judgement to determine if the diversity present in the set of results is adequate and encourages
the discovery and exploration of content related to the primary intent of the query.
14 id
Notes:
• Content types = Songs, Albums, Artist Pages, Playlist, Stations, Music Videos, Radio
Episodes, etc.
• Content types are easy visible next of each results in the BaseLine tool.
• Single albums should be considered equivalent to a song container since they only contain
one song.
nf

Satisfying Intent of the User and Diversity


Diversity alone doesn’t translate to a great search experience. It is important that the set of results
contains a result (or results) that satisfy the user intent.

Displaying the primary intent (and popular secondary intents) takes precedence over
Co

diversity. On a ranking prospective, primary intent and popular secondary intents should be
displayed in higher positions than diversity content types.

Diversity for Navigational queries


For navigational queries, diversity should be present especially for queries with an identifiable
intent where the user is looking for a specific song, artist or album/soundtrack.
For each of these intents, there is a separate list of diversity content types choices but all diversity
content types should be related to the primary and secondary intent of the query.

Diversity or lack of diversity should not be penalized if the query have any of the following
characteristics:
• The intent is ambiguous or unclear
o ex. [let me], [fine],[we m]
• The primary intent is not for a specific song, artist, album/soundtrack
o ex. [a-list pop] - primary intent for Apple Music Curated A-List Pop, [rock 105.3] -
primary intent for broadcast station ROCK 105.3.
• A content type is specified in the search query
o ex. [punk radio], [90’s country essentials playlist]

25 l
The preferred diversity content types are based on the query intent:

65 tia
[Diversity Container Types Choices - Navigational]

37 n
Examples
e
14 id
nf
Co

[as it was]
25 l
65 tia
37 ne
[the weekn]
14 id
nf
Co

[led zeppelin ii]

Diversity for Functional queries


For functional queries, it is very important to provide the users with different multi-songs container
types to allow them to choose their prefer container type.
The preferred container type for functional queries is a category page that satisfy the primary intent
of the query. In addition, the set of results should contain other related multi-song container types
like playlists, stations and high quality albums by various artists. All diversity content types
should be related to the primary (and possible secondary intent) of the query.

25 l
Diversity or lack of diversity should not be penalized if the query have any of the following
characteristics:

65 tia
• A content type is specified in the search query
o ex. [rock playlists], [workout station]

[Diversity Content Types Choices - Functional]

37 n
Examples

e
14 id
nf
Co

[jazz]
25 l
65 tia
37 ne
14 id
[sleep]

Side-Preference Task
nf

The main objective of the side-preference ratings is to determine which set of music results
provides a better search experience to the users for a given search query.

Your task is to indicate which set of results the users would prefer, and why by considering all the
music items returned in the two set of search results (up to 10 results each). Communicate your side
preference decision using the below rating scale
Co

Side Preference Rating Scale


Communicate your side-preference decision using the below rating scale:

• Left Much Better


• Left Slightly Better
• Neutral
• Right Slightly Better
• Right Much Better

Neutral

It is also possible that the search experience provided by the two sets of results are equally good or
bad or too similar to determine which set is better. In those cases, a neutral rating should be
given.

Left Slightly Better or Right Slightly Better

Left Slightly Better or Right Slightly Better should be chosen when there is not any major difference

25 l
between the two sets of results but only minor difference that still makes one side better than
the other, including:

65 tia
• Both side returning the primary intent of the user in the top position but one side returns an
irrelevant result in the lower positions.
• Both side returning the primary intent of the user in the top position but there is a minor
ranking changes present in the middle positions.
• One side returns the intended result in the top position and, on the other side, the intended
result is returned in the 3rd position.

Left Much Better or Right Much Better

37 n
Left Much Better or Right Much Better should be chosen when there is a significant different for at
least one of the factor listed above that makes one side much better or much worst than the other,
including:
• One side provides a set of results that is preferred for all of the search factors.
e
• One side returning a result that satisfies the primary intent of the user in top position(s) and
the other side is not returning the primary intent result and neither the secondary intent.
• Both side return the primary intent of the user but one side returns the intended result at
position 7 and the other side returns it at position 1.
14 id
• Both side return the primary intent of the user but one set of results is overall more relevant
and is better ranked.
• One side returns a result in the 2nd position which is completely irrelevant.

Search Factors to Consider


nf

When considering which set of results provides a better user experience, please consider the
following factors:

1. Primary or popular secondary intents - Does the set of results contain the primary and
popular secondary intent of the query?
o The most important aspectof the search is to return a result(s) that satisfy the
intent of the user. Not returning the intended content in the set of results can result
Co

in a frustrating user experience. Therefore, it is extremely important for the search


to understand the intent and return one or, in some cases, more results that
satisfy the intent of the query.
2. Relevance - How relevant are all the results returned to the search query? Is one (or more)
results irrelevant to the user intent? Does the set contain results that are just matching part
of the search query and and unrelated to the intent of the query?
o The second most important factor, after returning a result(s) that satisfies the
intent of the user, is for rest of the results to be still connected to the user
intent/the search query. This allows the set of results to be cohesive and
comprehensive which can facilitate the user to discover content related to their
original intent. If the set of results containing even just one irrelevant result, this
can negatively surprise the user and causing them to lose trust in the search
engine's ability to deliver useful content.
3. Ranking - Does the set of results have the most popular/recent content ranked higher than
other less recent and less popular content? Does the set of results returns the primary intent
in the first position?
o Ranking of search results can significantly impact the user experience, as users
are more likely to click on and engage with the top-ranked results. Therefore,

25 l
it is important for the set of results to have the most popular and closely related
content in the top positions, especially results that satisfy the intent of the user. A

65 tia
set of result that is well ranked, can help the user to find what they are looking for
quickly and without a lot of effort like scrolling.
4. Duplicates - Does the set of results contain duplicate results?
o The presence of duplicates is not ideal in a set of results because it can make it
more difficult for the users to find the content they are looking for or to browse and
discover are content. A set of result without duplicates delivers a cleaner, more
useful and more enjoyable user experience.
5. Diversity - Is the set of results diverse? Does it contain a balanced representation of
container types and that follows the diversity expectations? Do the diversity content types in

37 ne the set of results encourage discovery and exploration of content related to the primary
intent of the query?
o Diversity is important to enrich the search experience and to encourage the
exploration and discoveryof content which is related to the user intent. A
diversity set of results helps the user by showcasing the broad Apple Music
catalog, especially Apple Music curated content.

Note: The search factors above are listed in order of importance. E.g. an issue in relevance should
14 id
weight more on the side-preference decision than an issue with diversity.

Mandatory Comment
Each rating must be explained in the comment box. Even if “optional” is indicated, rating comments
are always mandatory.
nf

Use your comment to add insights regarding your rating, factor chosen and the two sets of results.
This is extremely valuable especially when your side preference decision is 'Neutral': is the user
experience mutually bad for both sets of results? Mutually good?
The comment should be concise and must only explain why the rating was chosen. Comments will
be used as qualitative data in this evaluation.
Co

Steps to Complete the Side-Preference Task


The major steps to complete the side-preference task are:

1. Analyze the search query to understand the primary intent of the user.
- Is the search query navigational or functional?
- What is the user looking for?
2. Determine if the intent of the query is clear or if determining the intent is not possible.

If the intent is clear:


3. Analyze if both sets of results contain a result(s) that satisfies the intent of the user.
4. Compare the two sets of results based on the other search factors.
- Relevance: How relevant are all the results returned? Is one (or more) results irrelevant to the
user intent? Do the two set of results contain results that are just matching part of the search query
and unrelated to the intent of the query?
- Ranking/Popularity: Do the two set of results have the most popular/recent content ranked
higher then other less recent and less popular content?
- Duplicates: Do the two set of results contain duplicate results?
- Diversity: Are the sets of results diverse? Do they contain a balanced representation of
container types that follows the diversity expectations? Do the diversity content types in the two sets

25 l
of results encourage discovery and exploration of content related to the primary intent of the query?
5. Indicate your side preference using the rating scale.

65 tia
6. Indicate which factor impacted your side preference decision the most.
- If you have selected 'Neutral' as your side preference, use the option 'none'.
7. Leave comment explaining your choice.

If the intent is unclear:


3. Analyze if both sets of results are identical, disregarding ranking differences.
4. If different, compare the two sets of results by analyzing relevancy and present of duplicates
- Relevancy for query with unclear primary intent is given from token matching between search
query and title or lyrics of content. A content that has a partial token matching is less relevant than a

37 n
content with title or lyrics fully matches the search query.
5. Indicate your side preference using the rating scale.
6. Indicate which factor impacted your side preference decision the most.
- If you have selected 'Neutral' as your side preference, use the option 'none'.
e
7. Leave comment explaining your choice.

When indicating your side-preference, you can use the below decision tree:
14 id
nf
Co

Decision Tree
Examples
Neutral

25 l
65 tia
37 ne
[Example 1 - di]
14 id
nf
Co

[Example 2 - classical]
25 l
65 tia
37 n
[Example 3 - baby hang over]
e
Left Slightly Better or Right Slightly Better
14 id
nf
Co

[Example 4 - beach music]


25 l
65 tia
37 n
[Example 5 - hometown girl]
e
14 id
nf
Co

[Example 6 - miley cyrus essentials]


25 l
65 tia
37 n
[Example 7 - rock]
e
14 id
nf
Co

[Example 8 - we where big sean]


Left Much Better or Right Much Better

25 l
65 tia
37 ne
[Example 9 - you want me i want you baby]
14 id
nf
Co

[Example 10 - raster polo g]


25 l
65 tia
37 n
[Example 11 - be where big sean]
e
14 id
nf
Co

[Example 12 - m83]

You might also like