You are on page 1of 3

@article{bobadilla2012collaborative,

title={A collaborative filtering similarity measure based on


singularities},
author={Bobadilla, Jes{\'u}s and Ortega, Fernando and Hernando, Antonio},
journal={Information Processing \& Management},
volume={48},
number={2},
pages={204--217},
year={2012},
publisher={Elsevier}
}

Main idea: This research presents a novel approach regarding similarity measures for collaborative
filtering technique. Taking into consideration contextual information about users, a singularity measure
can be designed for each item analyzing the votes given by pair of users. High values of singularity
between votes of two users will have high impact on the similarity. The whole methodology is applied
on three datasets: Movielens, Netflix and FilmAffinity.

Context: Recommender systems have a huge impact on reducing the negative influence of information
overwhelming for websites where users can vote items based on their preferences (e.g. websites about
movies). The most used recommendation technique is the collaborative filtering which implies to rate
certain items and then to define recommendations for a user based on most similar users with it. The
most popular similarity metrics are Pearson correlation, cosine, Spearman rank correlation and
constrained Pearson’s correlation. The focus of this research is to analyze the hidden attributes of user’s
ratings in order to define a new similarity metric that will improve the recommendation process.

Methodology: Usually, votes offered by an user take values from {1,..,5}. Bobadilla et all converted
values of 4 and 5 into a positive class (positive feedback) and values of 1,2 and 3 into a non-positive
class. Therefore, for the similarity approach votes that belong to the same class will be considered as
“similar”. The singularity concept is introduced for this methodology. The focus of the analysis will be on
votes offered by all users for a specific item, not votes of two users for which we determine the
similarity. For example, if 95% of users rated positively an item, the similarity, for this item, between
two users who voted negatively (they are in the 5% category) should be greater than the similarity
between users that belong to the 95% (“not very singular”). If all users voted in the same way an item,
then that item cannot be considered for computing a similarity. On the other hand, if only two users
voted differently an item, then this situation represents a great singularity case which means a good
similarity for that item.
Considering the following table, the similarity between user 1 and 6 for item 1 is very high due to the
fact that they are the only ones that votes positively the item 1 in comparison with the rest of the users.
For item 2, the similarity between user 1 and 6 is very low, because their votes is not different by votes
of other users (they voted in the same way with the rest of the group) . The case for item 3 shows that
user 1 voted positively the item and user 6 voted negatively. So, the similarity between users will be
very low , because the similarity of the user with high level of singularity (user 6) and any other user will
be the same. A possible problem can appear when new votes are offered for items. Then, an update for
singularity values should be considered in the future for the defined Recommender System.

Formalization: Let consider U the set of users and I set of items,

ru,i=rating of user u for item i, if not rating then ru,i=• which means no rating

The ratings accorder by an user for items can be from the set V={m,…,M}U{•} where
m is the lowest value (usually 1) and M is the highest value (usually 5).

R- set of relevant votes (positive votes) in the recommender system

RC-set of non relevant votes( non-positive votes)

If m=1 and M=5 then R={1,2,3} and RC={4,5}

Pi={u∈U| ru,i∈R} set of users who rated the item i with a relevant vote
Ni={u∈U|ru,i∈RC} set of users that rated item I with a non-relevant vote

|𝑃𝑖|
Spi is the singularity of the relevant vote for item i =1- |𝑈|

For the previous figure/table, Sp1=1-2/8=0,75


|𝑁𝑖|
SNi is the singularity of non relevant vote for item I = 1 - |𝑈|

For the previous figure/table SNi= 1-6/8=0,25

Spi and SNi ∈ [0,1] , where 1 is not very common (very singular) and 0 very common (not very
singular)

Then, three sets are defined for two users x and y:

A=items rated as relevant for both users


B=items rated as non-relevant by both users

C= items rated relevant by one user and non-relevant by the other one

Finally, the similarity between two users x and y is computed as follows:

Experiments: three datasets :MovieLens, FilmAffinity and Netflix with a range from 2 to 20
recommendations (top-n recommendations) and relevance thresholds 5 for Movielens and Netflix
and 9 for FilmAffinity. For MovieLens and Film Affinity , 20% of users are taken randomly for
testing, for Netflix 5% of users are used as test users due to the huge amount of users and 20% of
items are used as test items. Movielens and Netflix have ratings from 1 to 5, while FilmeAffinity has
votes from 1 to 10. As evaluation measures were computed mean absolute error, precision, recall,
coverage and perfect prediction. The singularity approach is compared with the traditional ones
(Pearson similarity, Spearman, cosin). The proposed approach significantly improves the
recommendation process, on average, by 20% for recall and by 60% for precision.

In conclusion, the singularity measure approach brings valuable information for the recommender
systems to the analysis of contextual information, thing that is ignored by current RS.

You might also like