/  3
 
The Snowflake Number
Erik Duval, Katrien Verbert
, Xavier Ochoa
, Wayne Hodgins
ABSTRACT
This is a paper about mass hyper-personalization; morespecifically: about how to measure personalization in webbased systems and beyond, using a series of metrics that wecall ’snowflake numbers’.
1. THE SNOWFLAKE EFFECT
Many of the more exciting and successful current web appli-cations rely to a great degree on the personalized user expe-riences they enable. Typical examples include amazon.comthat presents itself as an individualized book store with per-sonalized recommendations, web based radio stations likelast.fm or pandora.com that take into account the personalpreferences and interest of the listener, or social network-ing sites like facebook.com that support the interactions be-tween a user and his network of friends.We refer to this trend of mass hyper-personalization, whichcan also be observed in non-web based systems like cloth-ing, food or travel, as ”The Snowflake Effect”[3]. The namederives from the notion that, just like every snowflake in asnowstorm is unique, we are all unique individuals, with ourspecific characteristics and interests. The better a web appli-cation adapts to these characteristics and interests, the morerelevant and useful it is. We are interested in
mass
person-alization where we can achieve this effect at very large scale(for everyone, all the time) and in
hyper-
personalization be-cause we believe that it is important to push the boundariesbeyond simple changes in generic templates.We believe that it is important to better understand howthe Snowflake Effect can be put to good use, for individualsand society as a whole alike. Such understanding requires amore precise way to measure the characteristics that make
Dept. Computerwetenschappen, Katholieke UniversiteitLeuven, Belgium
Escuela Superior Polit´ecnica del Litoral, Ecuador
Autodesk, Inc., USAus
really 
unique. This paper introduces a set of metrics thatenable such precise measurements. It generalizes our earlierwork on learnometrics that considered the field of technologyenhanced learning [5] and on a quantitative analysis of user-generated content on the Web [4].
2. SNOWFLAKE NUMBER
A simple example, consider all the playlists of songs - theiTunes Music Store lists 1.421.247 public iMixes at this mo-ment, and almost 7 million votes on those playlists. Playlists(public as iMixes or private) are unique for many iTunesusers, but maybe not for all? Or consider someone’s listen-ing history: probably, if you only consider one song, thenthat is not very unique for most of us: there are probablyother people who listened to this song too. Take a set of 500songs that someone has listened to: there may still be otherpeople that listened to these songs. But is there a number
n
that makes the listener unique, in that there would be noother person that listened to the same
n
songs?The Snowflake Number for a person then is the minimumnumber of items that make him or her unique in a given ap-plication. This can take several forms: bookmarks or tagson delicious.com; tags on last.fm or bibsonomy.org; songsplayed on last.fm or pandora.com; books bought on ama-zon.com or songs bought on the iTunes Music Store; slidesfavoured on slideshare.com or video favoured on youtube.com,etc. We can apply a similar point of view to the non-digitalworld and consider groceries, food and other items in a shop-ping cart or refrigerator, a meal or cupboards; skills, knowl-edge and abilities of individuals that make them the rightor unique person best for a job or role on a project team,parts of an assembly of a machine, etc.For instance, if a reader has only bought very popular bookson amazon, then any book he has bought will have beenbought by someone else as well. Therefor, that choice doesnot make him unique and his snowflake number is greaterthan one. If any combination of two books that he hasbought, has also been bought by another user, then hissnowflake number will be higher than two. Imagine thatall combinations of six books bought by this consumer werealso bought by at least one other amazon customer and thatno other user has bought a particular set of seven books thatthis reader has bought, then his snowflake number is seven.Intuitively, someone with a more mainstream taste will havea higher snowflake number, whereas someone with a more
 
exotic taste will have a lower one. More formally, for a givenset of items
, where
i
is the subset of items related to aparticular user
i
, the Snowflake Number
s
i
for that user
i
is
n
if there are no users with whom he shares
n
items, butthere is at least one other user
j
with whom user
i
shares
n
1 items:
 j
:
card
(
i
j
) =
n
1
k
:
card
(
i
k
) =
n
Of course, there may be users
i
whose set
i
is subsumedin that of another user
k
. For those users, we define theSnowflake Number as infinite
: this is somewhat arbitrary,but ensures that ”users with a lower snowflake number aremore unique”. Indeed, if two users have exactly the sametaste, they are not unique at all.As a simple example, on facebook.com, the snowflake num-ber of the first author of this paper on facebook is 2 if weconsider friends as items: Erik is a friend of Leo and Wayneand there is nobody else in the facebook universe who is afriend of both Leo and Wayne.The snowflake number also applies to groups, when we wantto determine the specific characteristics of a group. More-over, we find many examples of the snowflake effect outsidethe realm of web based systems in a strict sense. For in-stance, consider the votes for the EuroVision Song ContestFinals in 2008 [1]: Switzerland was the only country thatvoted for Germany, together with Bulgaria. Switzerlandalso voted for Albania, whereas Bulgaria didn’t. Hence, theSnowflake Number for Switzerland is two. On the otherhand, the Snowflake Number for Russia is four, as it votedfor Armenia, Croatia, Georgia and Serbia, and no othercountry voted for those four countries; moreover, for everyother subset of three countries that Russia voted for, thereis another country that also voted for those three countries.
3. TOWARDSSNOWFLAKENUMBERRE-SEARCH
In this paper, we just touch upon a rather large theme.There are many variations on the snowflake number idea.
One can either consider the
ordered or unordere
listof items, for instance in time: the ordered SnowflakeNumber of web page visits would consider how manypeople visited the same
n
web pages as their most re-cent pages. That Snowflake Number will probably bemuch lower than the unordered version that considersall the web pages one has ever visited.
It will be interesting to investigate how the snowflakenumber is
distributed 
over a community of users: willthere be the usual Heavy-Tailed distribution with thewell-known long tail effect? Or is the distribution of snowflake numbers more akin to a Gaussian distribu-tion?
Another interesting theme will be to study the evolu-tion of the Snowflake Number in time: in the initial-ization phase of an application, most users will havea snowflake number of one as they will have uniquefeatures: for instance, they will be the first, and there-for the only ones, to have introduced a tag. As a sys-tem becomes more popular, the snowflake numbers willstart to rise: idiosyncratic tags aside, it is probablybecoming quite difficult to find a tag that hasn’t beenentered yet in systems like delicious.com.
Applications of the Snowflake Number could be quitediverse: for instance, in some contexts, one explic-itly wants to avoid crossing the Snowflake Number:k-anonymity is such an approach where the generalidea is to have duplicates in any sequence of data sothat the data sequence cannot point to one single per-son.
If we define the Snowflake Number over a graph, withitems and users represented by nodes and edges thatconnect users with their items, then there are graphcharacteristics that we can relate to the Snowflake Num-ber [2]. In that approach, we can also fold the graphover the items to study how the Snowflake Numberdiffuses over the graph.
People with a high snowflake number contribute littlenew information. They are not really unique and donot really add any interesting connections. People witha low snowflake number contribute unique connectionsbetween items and therefor make the underlying graphmore connected. They act somewhat as hubs in theunderlying network [2]. Of course, this is not a value judgment on the people involved - indeed, two ’soulbrothers’ have by definition an infinite snowflake num-ber - but an evaluation about the connections thatthey, and they alone, add to the network.
There are certainly other possible metrics. As a simpleexample, we could define the snowflake number for apair of users
a
and
b
as
s
(
a,b
) = 1
card
(
I
a
I
b
)
card
(
I
a
)
. Inthat case, we could define the snowflake number of auser
a
from a set of 
users as
s
(
a
) =
P
Ni
=1
s
(
a,x
i
)
1
.This would mean that more unique users get a highersnowflake number, the highest value is 1 and the lowestvalue is 0.
Most importantly, we want to extend the scope of whatwe try to measure by also explicitly addressing in thesnowflake number the uniqueness of the situation, en-vironment, context, etc. This requires that we add anadequate way to take relevancy of connections betweenusers and items into account.
4. CONCLUSION
Studying characteristics of the snowflake number will en-able us to better understand what it means to be ”unique”in a particular community. Such research will also help usunderstand how we can best support the unique character-istics, requirements and aims of the user community. Thatis an essential characteristic of successful and relevant webapplications and therefor we believe that the ideas presentedabove can contribute significantly to web science.
Acknowledgements
We acknowledge the comments and feedback from MartinWolpers, Daan Bohnen, ”Tom”and Thomas Broeker on ear-lier versions of this paper that circulated in blogosphere.

Share & Embed

More from this user

Add a Comment

Characters: ...