Eight Friends Are Enough:Social Graph Approximation via Public Listings
Joseph Bonneau
Computer LaboratoryUniversity of Cambridge
jcb82@cl.cam.ac.ukJonathan Anderson
Computer LaboratoryUniversity of Cambridge
jra40@cl.cam.ac.ukFrank Stajano
Computer LaboratoryUniversity of Cambridge
fms27@cl.cam.ac.ukRoss Anderson
Computer LaboratoryUniversity of Cambridge
rja40@cl.cam.ac.uk
ABSTRACT
The popular social networking website Facebook exposes a“public view” of user profiles to search engines which in-cludes eight of the user’s friendship links. We examine whatinteresting properties of the complete social graph can beinferred from this public view. In experiments on real socialnetwork data, we were able to accurately approximate thedegree and centrality of nodes, compute small dominatingsets, find short paths between users, and detect commu-nity structure. This work demonstrates that it is difficult tosafely reveal limited information about a social network.
Categories and Subject Descriptors
K.4.1 [
Computers and Society
]: Public Policy Issues —Privacy; E.1 [
Data Structures
]: Graphs and networks;F.2.1 [
Analysis of Algorithms and Problem Complex-ity
]: Numerical Algorithms and Problems; K.6.5 [
Manage-ment of Computing and Information Systems
]: Secu-rity and Protection
General Terms
Security, Algorithms, Experimentation, Measurement, The-ory, Legal Aspects
Keywords
Social networks, Privacy, Web crawling, Data breaches, Graphtheory
1. INTRODUCTION
The proliferation of online social networking services hasentrusted massive silos of sensitive personal information tosocial network operators. Privacy concerns have attracted
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
SNS
’09 Nuremberg, GermanyCopyright 2009 ACM 978-1-60558-463-8 ...$5.00.
considerable attention from the media, privacy advocatesand the research community. Most of the focus has beenon
personal data privacy:
researchers and operators haveattempted to fine-tune access control mechanisms to pre-vent the accidental leakage of embarrassing or incriminatinginformation to third parties.A less studied problem is that of
social graph privacy
: pre-venting data aggregators from reconstructing large portionsof the social graph, composed of users and their friendshiplinks. Knowing who a person’s friends are is valuable in-formation to marketers, employers, credit rating agencies,insurers, spammers, phishers, police, and intelligence agen-cies, but protecting the social graph is more difficult thanprotecting personal data. Personal data privacy can be man-aged individually by users, while information about a user’splace in the social graph can be revealed by any of the user’sfriends.
1.1 Facebook and Public Listings
Facebook is the world’s largest social network, claimingover 175 million active users, making it an interesting casestudy for privacy. Compared to other social networking plat-forms, Facebook is known for having relatively accurate userprofiles, as people primarily use Facebook to represent theirreal-world persona[10]. Thus, Facebook is often at the fore-front of criticism about online privacy[2,17]. In September
2007, Facebook started making“public search listings”avail-able to those not logged in to the site – an example is shownin Figure1. These listings are designed to encourage visitorsto join by showcasing that many of their friends are alreadymembers.Originally, public listings included a user’s name, photo-graph, and 10 friends. Showing a new random set of friendson each request is clearly a privacy issue, as this allows a webspider to repeatedly fetch a user’s page until it has viewed allof that user’s friends
. In January 2009, public listings werereduced to 8 friends, and the selection of users now appearsto be a deterministic function of the requestin IP address
1
Retrieving
n
friends by repeatedly fetching a random sam-ple of size
k
is an instance of the
coupon collector’s problem
.It will take an average of
n
·
H
n
k
= Θ(
nk
log
n
) queries to re-trieve the complete set of friends. A set of 100 friends, forexample, would require 65 queries to retrieve with
k
= 8.
2
Using the anonymity network Tor, we were able retrieve dif-1
Add a Comment