Professional Documents
Culture Documents
925
WWW 2012 – AdMIRe'12 Workshop April 16–20, 2012, Lyon, France
926
WWW 2012 – AdMIRe'12 Workshop April 16–20, 2012, Lyon, France
927
WWW 2012 – AdMIRe'12 Workshop April 16–20, 2012, Lyon, France
User activity has been recorded in two different ways. One to blacklist, but also not react at all –which in our opinion
was Google Analytics (GA) and the other was the site’s in- is another negative reaction. As can be seen in Table 1, ac-
ternal data base (DB). In the case of GA, we have associated tivity rate (AR) measure, our data shows that for the same
events to user actions of adding to playlist and adding to number of recommendations, the Mix recommender results
blacklist. In the case of DB, we have the playlist and black- in more user activity than the other two. In other words, it
list tables in the data base. To be able to identify whether appears that users are more likely to react to recommenda-
each track added to the playlists had been recommended, we tions when confronted with recommendations of Mix than
added a source field indicating which recommender had done those of the other two. This means that users will generate
the job. In the end, we have observed some non significant more additions to playlist, and more additions to blacklist,
differences in the values obtained from GA and DB, which with Mix than with Content and Usage. The former can
comforted us in the quality of the data to be analyzed. be observed in Table 1, that show a significantly higher ab-
To measure the variation of the recommenders effects, we solute acceptance rate (AAR) for Mix. When compared to
have divided the 22 weeks into 11 periods of 2 weeks. For Content, Mix presents a gain of 119%. With respect to
each period we have measured the number of sessions (S), the Usage, it shows a gain of 50%. This means that users
the number of additions to playlists (P ) and the number of getting the Mix suggestions had a significant tendency for
additions to blacklists (B) for each recommender. reacting more positively to recommendations.
From these three basic measures we have defined the fol- Finally, regarding the loyalty3 rate (L3R), we see in Ta-
lowing derived measures: ble 1 that the Mix recommender is similar to Content but
significantly better than Usage. There, the system Mix
presents a gain of 16% when compared to Usage.
Activity rate = (P + B)/S, (6)
Absolute acceptance rate = P/S, (7)
Table 1: Results. Values with (*) represent recom-
Relative acceptance rate = P/(P + B). (8)
mendation methods whose differences with Mix are
Google Analytics also provides information about the num- statistically significant (p-value < 0.05).
ber and frequency of users who return to the site. For a given Measure System Mean Std. Dev. p-value
period, L(x) is the number of users who return x times to Mix 0.499 0.157 -
the site. Loyalty can then be measured in many different RAR Content 0.512 0.164 0.848
ways. We have tried to capture loyalty by counting users Usage 0.600 0.125 0.162
who return three or more than three times and using as ref- Mix 0.165 0.061 -
erence the number of users who return less than three times. AR Content 0.074 (*) 0.025 0.001
We call this measure Loyalty3 rate. Usage 0.088 (*) 0.021 0.002
Mix 0.081 0.038 -
AAR Content 0.037 (*) 0.018 0.013
x≥3 L(x)
Loyalty3 rate = . (9) Usage 0.054 (*) 0.023 0.049
L(1) + L(2)
Mix 1.880 0.376 -
For each measure, and each recommender, we have col- L3R Content 1.870 0.171 0.867
lected samples with values from the 11 periods. We then Usage 1.620 (*) 0.196 0.044
compare averages and standard deviations of the measures
and perform two-tailed t-tests (α = 0.05) to determine the
significance of the differences.
5. CONCLUSIONS
In this paper we proposed a music recommender system
4.2 Results that combines usage and content data. Our proposal has
In this section we discuss the results obtained with our been evaluated online, with real users, on a commercial web
case study. During the evaluation period there were about site of music from the very long tail. Our results show that
57000 sessions involving recommendations, where 1327 users Mix generates more activity and at least the same amount
made 3267 additions to playlists and 3123 additions to black- of positive responses of Content and Usage (or more, de-
lists. pending on the evaluation measure). Finally, it has good
As reported in Table 1, overall, Mix shows a slightly lower results in terms of promoting user loyalty. Hence our con-
relative acceptance rate (RAR) than Content and Usage. clusion that Mix tends to be a better option to music recom-
However, the differences are not significant (this is due to mendation than the other two. Mix is currently the core rec-
the high variability of all three recommenders with respect ommendation engine on http://www.palcoprincipal.com.
to the 11 periods of 2 weeks, as shown in the relatively high We are currently developing a monitoring tool for contin-
standard deviations), and all three recommenders have a uously collecting and analyzing the activity of the recom-
relative acceptance around 0.5. This can be understood as menders of the site. This will allow the owners of the site
follows: in response to a given recommendation, the user is to keep an eye on the impact of the recommenders. On the
as likely to react with an addition to playlist (i.e., a positive other hand, it will give us more reliable data and will enable
reaction) than an addition to blacklist (i.e., a negative reac- us to look into other facets of the recommendations, such as
tion). This appears to be true for all three recommenders. variety and sensitivity to the order. With that information
This does not however mean that the three recommenders we will be able to better understand what makes users more
have a similar performance. Indeed, given a recommenda- active, as well as to design recommenders that may have
tion, a user can not only react by an addition to playlist or different mixes, depending on the profile of the user.
928
WWW 2012 – AdMIRe'12 Workshop April 16–20, 2012, Lyon, France
929