• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
Evaluating Similarity Measures:A Large-Scale Study in the Orkut Social Network
Ellen Spertus
Mills College5000 MacArthur Blvd.Oakland, CA 94613
spertus@mills.eduMehran Sahami, Orkut Buyukkokten
Google1600 Amphitheatre ParkwayMountain View, CA 94043
{
sahami, orkut
}
@google.com
ABSTRACT
Online information services have grown too large for usersto navigate without the help of automated tools such as col-laborative filtering, which makes recommendations to usersbased on their collective past behavior. While many similar-ity measures have been proposed and individually evaluated,they have not been evaluated relative to each other in a largereal-world environment. We present an extensive empiricalcomparison of six distinct measures of similarity for recom-mending online communities to members of the Orkut socialnetwork. We determine the usefulness of the different rec-ommendations by actually measuring users’ propensity tovisit and join recommended communities. We also exam-ine how the ordering of recommendations influenced userselection, as well as interesting social issues that arise inrecommending communities within a real social network.
Categories and Subject Descriptors
H.2.8 [
Database Management
]: Database Applications—
data mining 
; H.3.5 [
Information Storage and Retrieval
]:Online Information Services; I.5 [
Computing Methodolo-gies
]: Pattern Recognition
General Terms
Algorithms, measurement, human factors
Keywords
Data mining, collaborative filtering, recommender system,similarity measure, online communities, social networks
1. INTRODUCTION
The amount of information available online grows far fasterthan an individual’s ability to assimilate it. For example,consider “communities” (user-created discussion groups) with-in Orkut, a social-networking website (http://www.orkut.com)
Copyright 2005ACM. This is the author’s version of the work. It is postedhere by permission of ACM for your personal use. Not for redistribution.The definitive version was published in KDD ’05, August 21-24, 2005http://doi.acm.org/10.1145/1081870.1081956
affiliated with Google. The original mechanisms for usersto find communities were labor-intensive, including search-ing for keywords in community titles and descriptions orbrowsing other users’ memberships. Four months after itsJanuary 2004 debut, Orkut had over 50,000 communities,providing the necessity and opportunity for data-mining forautomated recommendations. There are now (May 2005)over 1,500,000 communities.While there are many forms of recommender systems [3],we chose a collaborative filtering approach [13] based onoverlapping membership of pairs of communities. We didnot make use of semantic information, such as the descrip-tion of or messages in a community (although this may be anarea of future work). Our recommendations were on a per-community, rather than a per-user basis; that is, all mem-bers of a given community would see the same recommen-dations when visiting that community’s page. We chose thisapproach out of the belief, which was confirmed, that com-munity memberships were rich enough to make very usefulrecommendations without having to perform more compu-tationally intensive operations, such as clustering of users orcommunities or computing nearest neighbor relations amongusers. Indeed, Sarwar et al. have found such item-based al-gorithms to be both more efficient and successful than user-based algorithms [13]. By measuring user acceptance of rec-ommendations, we were able to evaluate the absolute andrelative utility of six different similarity measures on a largevolume of data.
2. MEASURES OF SIMILARITY
The input data came from the membership relation
M
=
{
(
u,c
)
|
u
∈ U 
,c
∈ C}
,
where
C
is the set of communitieswith at least 20 members and
the set of users belong-ing to at least one such community. When we began ourexperiment in May 2004,
|C|
= 19
,
792,
|U|
= 181
,
160, and
|M|
= 2
,
144
,
435. Table 1 summarizes the distribution.All of our measures of community similarity involve the
overlap
between two communities, i.e., the number of com-
Table 1: Distribution of community memberships
min max median
σ
Users per community
20 9077 50 230.5
Communities per user
1 4173 6 28.0
 
mon users. If a base community
b
and a (potentially) relatedcommunity
r
are considered as sets of users, the overlap is
|
B
R
|
, where we use capital letters to represent the set con-taining a community’s members. Note that overlap cannotbe the sole factor in relatedness, as the size of communitiesvaries greatly. If we only considered overlap, practically ev-ery community would be considered related to the “Linux”community, which was the most popular, with 9,077 mem-bers. The similarity measures in the next section normalizethe overlap in different ways.
2.1 Similarity Measure Functions
Each similarity measure we consider is presented as a (pos-sibly asymmetric) function of 
b
and
r
indicating how appro-priate the related community
r
is as a recommendation forthe base community
b
. We do not use the result of thefunction as an absolute measure of similarity, only to rankrecommendations for a given base community.
2.1.1 L1-Norm
If we consider the base and related communities to bevectors
b
and
r
, where the
i
th
element of a vector is 1 if user
is a member and 0 if not, we can measure the overlapas the product of their L1-norms:
L
1(
b ,
r
) =
b
·
r
b
1
·
r
1
This quantity can also be expressed in set notation, wherewe use a capital letter to represent the set containing a com-munity’s members:
L
1(
B,R
) =
|
B
R
||
B
|·|
R
|
Note that this evaluates to the overlap between the twogroups divided by the product of their sizes. When thebase community is held constant (as when we determine thebase community’s recommendations), this evaluates to theoverlap divided by the size of the related community, fa-voring small communities. Kitts et al. [9] reported this tobe a successful measure of similarity in their recommendersystem.
2.1.2 L2-Norm
Similarly, we can measure the overlap with the product of the L2-norms (“cosine distance” [3, 6, 12]) of 
b
and
r
:
L
2(
b ,
r
) =
b
·
r
b
2
·
r
2
In set notation:
L
2(
B,R
) =
|
B
R
|
 
|
B
|·|
R
|
Note that the square-root in the denominator causes L2 topenalize large communities less severely than L1.Observe that the L2-norm presented here is equivalent tothe widely used cosine coefficient applied to binary data.Moreover, while Pearson correlation has been used previ-ously in recommender systems where ranking data is avail-able, we did not use this measure here since it is generallyconsidered inappropriate for binary data.
2.1.3 Pointwise Mutual-Information: positive corre-lations (MI1)
Information theory motivates other measures of correla-tion, such as “mutual information” [2]. We chose pointwisemutual information where we only count “positive” corre-lations (membership in both B and R). Such a formulationessentially focuses on how membership in one group is pre-dictive of membership in another (without considering howbase non-membership in a group effects membership in an-other group), yielding:
M
1(
b,r
) = P(
r,b
)
·
lgP(
r,b
)P(
r
)
·
P(
b
)
2.1.4 Pointwise Mutual-Information: positive and negative correlations (MI2)
Similarly, we can compute the pointwise mutual infor-mation with both positive and negative correlations (e.g.,membership in both B and R, or non-membership in bothgroups). Again, we don’t compute the full expected mutualinformation, since we believe cross-correlations (e.g., howmembership in B affects non-membership in R) tend to bedistortive with the recommendation task since such cross-correlations are plentiful but not very informative. Thisyields:
M
2(
b,r
) = P(
r,b
)
·
lgP(
r,b
)P(
r
)
·
P(
b
)+ P(¯
r,
¯
b
)
·
lgP(¯
r,
¯
b
)P(¯
r
)
·
P(¯
b
)
2.1.5 Salton (IDF)
Salton proposed a measure of similarity based on inversedocument frequency scaling (tf-idf) [12]:
ID
(
b,r
) =
(
r
|
b
)
·
(
lgP(
r
))
ID
(
B,R
) =
|
B
R
||
B
|·
(
lg
|
R
||U|
)
2.1.6 Log-Odds
We first considered the standard log-odds function, whichmeasures the relative likelihood that presence or absence ina base community predicts membership in a related commu-nity:
LogOdds
0(
b,r
) = lgP(
r
|
b
)P(
r
|
¯
b
)Empirically, we found this generated the exact same rank-ings as using the L1-Norm, which makes sense because:1. Logarithm is monotonic and, while affecting scores,does not affect rankings.2. Constant factors, such as
|
B
|
, do not affect rankings.3. For
|
B
|  |
|
, P(
r
|
¯
b
)
P(
r
)We formulated a different log-odds metric, which measureswhether membership in the base community is likelier topredict membership or absence in the related community:
LogOdds
(
b,r
) = lgP(
r
|
b
)P(¯
r
|
b
)
 
Table 2: Average size of top-ranked community foreach measure
measure Average sizerank 1 rank 2 rank 3L1
332 482 571
L2
460 618 694
MI1
903 931 998
MI2
966 1003 1077
IDF
923 985 1074
LogOdds
357 513 598
Table 3: Agreement in top-ranked results betweenmeasures. For example, MI1 and IDF rank the samerelated community first for 98% of base communi-ties. Correlations greater than 85% are in bold.
L1
.70
L2
.41 .60
MI1
.39 .57
.96 MI2
.41 .59
.98 .97 IDF.88
.79 .46 .44 .46
LogOdds
2.2 Discussion
For a given measure, we refer to the related communityyielding the highest value to be the
top-ranked 
related com-munity relative to a base community. The average sizeof top-ranked communities for each measure, which variesgreatly, is shown in Table 2. Table 3 shows how often twofunctions yield the same top-ranking result. Table 4 showsthe top recommendations for the “I love wine” community.Note that MI1, MI2, and IDF favor very large communities,while L1 and LogOdds favor small communities.Note that in addition to the obvious correlations betweenthe two mutual-information functions (96%), there is a verystrong correlation between IDF and the mutual-informationfunctions (97-98%). Manipulation of the formulas for MI1and IDF shows:
M
1(
b,r
) = P(
r,b
)
·
lgP(
r,b
)P(
r
)
·
P(
b
)= P(
r
|
b
)
·
P(
b
)
·
lgP(
r
|
b
)
P(
r
|
b
)
·
P(
b
)
·
lgP(
r
)= P(
r
|
b
)
·
P(
b
)
·
lgP(
r
|
b
)
P(
r
|
b
)
·
[1
P(¯
b
)]
·
lgP(
r
)= P(
r
|
b
)
·
[P(
b
)
·
lgP(
r
|
b
) + P(
r
|
b
)
·
P(¯
b
)
·
lgP(
r
)]
P(
r
|
b
)
·
lgP(
r
)Substituting
ID
(
b,r
) =
(
r
|
b
)
·
lgP(
r
), we get:
M
1(
b,r
) = P(
r
|
b
)
·
P(
b
)
·
lgP(
r
|
b
) + P(¯
b
)
·
lgP(
r
)
+
ID
(
b,r
)Since for virtually all communities
b
,
(
b
)
(¯
b
), we canapproximate:
M
1(
b,r
)
ID
(
b,r
) + P(
r
|
b
)
·
P(¯
b
)
·
lgP(
r
)Thus, MI1 yields a ranking that can be thought of as start-ing with the ranking of IDF and perturbing the score of eachelement in the ranking by P(
r
|
b
)
·
P(¯
b
)
·
lgP(
r
), which gener-ally is not great enough to change the relative ranking of thetop scores, leading to MI1 and IDF often giving the sameranking to top-scoring communities. (Note that this pertur-bation quantity is given only to explain the high correlationbetween MI1 and IDF. Statistically, it is meaningless, since
b
and¯
b
cannot simultaneously hold.)
3. EXPERIMENT DESIGN
We designed an experiment to determine the relative valueof the recommendations produced by each similarity mea-sure. This involved interleaving different pairs of recommen-dations and tracking user clicks. Specifically, we measuredthe efficacy of different similarity measures using pair-wisebinomial sign tests on click-through data rather than us-ing traditional supervised learning measures such as preci-sion/recall or accuracy since there is no “true” labeled datafor this task (i.e., we do not know what are the correct com-munities that should be recommended to a user). Rather,we focused on the task of determining which of the similar-ity measures performs best on a relative performance scalewith regard to acceptance by users.
3.1 Combination
When a user viewed a community page, we hashed thecombined user and community identifiers to one of 30 val-ues, specifying an ordered pair of similarity measures tocompare. Let S and T be the ordered lists of recommenda-tions for the two measures, where
= (
s
1
,s
2
,...,s
|
S
|
) and
= (
t
1
,t
2
,...,t
|
|
) and
|
|
=
|
|
. The recommendations of each measure are combined by Joachims’ “Combined Rank-ing” algorithm [7], restated in Figure 1. The resulting list isguaranteed to contain the top
k
S
and
k
recommendationsfor each measure, where
k
k
S
k
+ 1 [7, Theorem 1].
3.2 Measurements
Whenever a user visited a community, two measures werechosen and their recommendations interleaved, as discussedabove. This was done in a deterministic manner so thata given user always saw the same recommendations for agiven community. To minimize feedback effects, we did notregenerate recommendations after the experiment began.A user who views a base community (e.g., “I love wine”) iseither a member (denoted by “M”) or non-member (denotedby “n”). (We capitalize “M” but not “n” to make them eas-ier to visually distinguish.) In either case, recommendationsare shown. When a user clicks on a recommendation, thereare three possibilities: (1) the user is already a member of the recommended community (“M”), (2) the user joins therecommended community (“j”), or (3) the user visits butdoes not join the recommended community (“n”). The com-bination of base and related community memberships can becombined in six different ways. For example “M
 j” denotesa click where a member of the base community clicks on arecommendation to another community to which she doesnot belong and joins that community. Traditionally, anal-yses of recommender systems focus on “M
 j”, also knowninformally as “if you like this, you’ll like that” or formallyas “similarity” or “conversion”. “M
n” recommendationsare considered distracters, having negative utility, since theywaste a user’s time with an item not of interest. Before run-ning the experiment, we decided that the measures shouldbe judged on their “M
 j” performance.Other interpretations are possible: “M
n” links could beconsidered to have positive utility for any of the following
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...