exotic taste will have a lower one. More formally, for a givenset of items
I
, where
I
i
is the subset of items related to aparticular user
i
, the Snowflake Number
s
i
for that user
i
is
n
if there are no users with whom he shares
n
items, butthere is at least one other user
j
with whom user
i
shares
n
−
1 items:
∃
j
:
card
(
I
i
∩
I
j
) =
n
−
1
∧
k
:
card
(
I
i
∩
I
k
) =
n
Of course, there may be users
i
whose set
I
i
is subsumedin that of another user
k
. For those users, we define theSnowflake Number as infinite
∞
: this is somewhat arbitrary,but ensures that ”users with a lower snowflake number aremore unique”. Indeed, if two users have exactly the sametaste, they are not unique at all.As a simple example, on facebook.com, the snowflake num-ber of the first author of this paper on facebook is 2 if weconsider friends as items: Erik is a friend of Leo and Wayneand there is nobody else in the facebook universe who is afriend of both Leo and Wayne.The snowflake number also applies to groups, when we wantto determine the specific characteristics of a group. More-over, we find many examples of the snowflake effect outsidethe realm of web based systems in a strict sense. For in-stance, consider the votes for the EuroVision Song ContestFinals in 2008 [1]: Switzerland was the only country thatvoted for Germany, together with Bulgaria. Switzerlandalso voted for Albania, whereas Bulgaria didn’t. Hence, theSnowflake Number for Switzerland is two. On the otherhand, the Snowflake Number for Russia is four, as it votedfor Armenia, Croatia, Georgia and Serbia, and no othercountry voted for those four countries; moreover, for everyother subset of three countries that Russia voted for, thereis another country that also voted for those three countries.
3. TOWARDSSNOWFLAKENUMBERRE-SEARCH
In this paper, we just touch upon a rather large theme.There are many variations on the snowflake number idea.
•
One can either consider the
ordered or unordered
listof items, for instance in time: the ordered SnowflakeNumber of web page visits would consider how manypeople visited the same
n
web pages as their most re-cent pages. That Snowflake Number will probably bemuch lower than the unordered version that considersall the web pages one has ever visited.
•
It will be interesting to investigate how the snowflakenumber is
distributed
over a community of users: willthere be the usual Heavy-Tailed distribution with thewell-known long tail effect? Or is the distribution of snowflake numbers more akin to a Gaussian distribu-tion?
•
Another interesting theme will be to study the evolu-tion of the Snowflake Number in time: in the initial-ization phase of an application, most users will havea snowflake number of one as they will have uniquefeatures: for instance, they will be the first, and there-for the only ones, to have introduced a tag. As a sys-tem becomes more popular, the snowflake numbers willstart to rise: idiosyncratic tags aside, it is probablybecoming quite difficult to find a tag that hasn’t beenentered yet in systems like delicious.com.
•
Applications of the Snowflake Number could be quitediverse: for instance, in some contexts, one explic-itly wants to avoid crossing the Snowflake Number:k-anonymity is such an approach where the generalidea is to have duplicates in any sequence of data sothat the data sequence cannot point to one single per-son.
•
If we define the Snowflake Number over a graph, withitems and users represented by nodes and edges thatconnect users with their items, then there are graphcharacteristics that we can relate to the Snowflake Num-ber [2]. In that approach, we can also fold the graphover the items to study how the Snowflake Numberdiffuses over the graph.
•
People with a high snowflake number contribute littlenew information. They are not really unique and donot really add any interesting connections. People witha low snowflake number contribute unique connectionsbetween items and therefor make the underlying graphmore connected. They act somewhat as hubs in theunderlying network [2]. Of course, this is not a value judgment on the people involved - indeed, two ’soulbrothers’ have by definition an infinite snowflake num-ber - but an evaluation about the connections thatthey, and they alone, add to the network.
•
There are certainly other possible metrics. As a simpleexample, we could define the snowflake number for apair of users
a
and
b
as
s
(
a,b
) = 1
−
card
(
I
a
∩
I
b
)
card
(
I
a
)
. Inthat case, we could define the snowflake number of auser
a
from a set of
N
users as
s
(
a
) =
P
Ni
=1
s
(
a,x
i
)
N
−
1
.This would mean that more unique users get a highersnowflake number, the highest value is 1 and the lowestvalue is 0.
•
Most importantly, we want to extend the scope of whatwe try to measure by also explicitly addressing in thesnowflake number the uniqueness of the situation, en-vironment, context, etc. This requires that we add anadequate way to take relevancy of connections betweenusers and items into account.
4. CONCLUSION
Studying characteristics of the snowflake number will en-able us to better understand what it means to be ”unique”in a particular community. Such research will also help usunderstand how we can best support the unique character-istics, requirements and aims of the user community. Thatis an essential characteristic of successful and relevant webapplications and therefor we believe that the ideas presentedabove can contribute significantly to web science.
Acknowledgements
We acknowledge the comments and feedback from MartinWolpers, Daan Bohnen, ”Tom”and Thomas Broeker on ear-lier versions of this paper that circulated in blogosphere.
Add a Comment
erik.duval4356left a comment