You are on page 1of 10

Music Recommendation System

 Spotify, saavn, youtube music, gaana, amazon prime music are some examples of
music streaming service. It will recommend songs based on previous songs.
 Q1) for building the system, what is the business metric?
o Time spent (based on UI, collection of songs, ads)
o # listens / # recommendation.

 For a given time t, the precision @top k.


 ‘k’ is the metric to be fixed by discussing with the business people.
Q2) what is the data required for the project?

 User data [age, language, gender, location]


 Item(song) data [ era, singer, key instrument, genre, sub-genre]
 User item rating. [ passively listen to music and that’s reason the rating information
is very sparse, because they will rate only for 1 percent of songs].
 User item affinity [ if he skips it will be zero, if he listens again and again it will be 1].
 User item timestamp.
 Search history
 Song – raw music audio [ we will see more complex features].
 Current trending songs.
 Musician/ Artist data
Yahoo music dataset is available. If needed can work on that.
Q3) how will you do train/test split?

 Time based splitting.


Q4) How do you model this as ML problem? (Classification/ regression/ recommendation
system)

 Weightage regression and recommendation system.


 We have the information (user and item) to construct a classification (will listen/ skip).
 The biz metric is precision at K. It is a classification metric.
 Suppose I have an item, we will find the content-based filtering features. Using genre,
artist, time, wave file itself(FFT, spectrogram).
 Let k be the top liked songs in the most recent past.

http://benanne.github.io/2014/08/05/spotify-cnns.html

 Spectrogram as image constructed CNN on top of it.


 40 features for each songs.
 Plotted tsne.
Q) what types of model will work well?

 LR/SVM

Q) if using kernel SVM, can we skip SV and just store the hyperplane separating the points?

 Suppose we have Xi and Xj belong to Rd.


 One way to find the similarity is XTX.
 K(Xi,Xj) will give the real value.
 But the equation will be in different dimension d’.
 If we don’t have these points, then how will we compute the plane?

Q) how do we productionize the recommendation system?

 Recent data for last five songs.


 Cold start (new user).
 Scalability. we can use spark.
 We will save user and items vectors with distributed hashtables and redis.

 Very simple model for recent last five songs and find most similar songs.

You might also like