/  11
 
Information Revelation and Privacy in Online SocialNetworks(The Facebook case)
Pre-proceedings version. ACM Workshop on Privacy in the Electronic Society (WPES), 2005
Ralph Gross
Data Privacy LaboratorySchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA 15213
rgross@cs.cmu.eduAlessandro Acquisti
H. John Heinz IIISchool of Public Policy and ManagementCarnegie Mellon UniversityPittsburgh, PA 15213
acquisti@andrew.cmu.edu
ABSTRACT
Participation in social networking sites has dramatically in-creased in recent years. Services such as Friendster, Tribe,or the Facebook allow millions of individuals to create onlineprofiles and share personal information with vast networksof friends - and, often, unknown numbers of strangers. Inthis paper we study patterns of information revelation inonline social networks and their privacy implications. Weanalyze the online behavior of more than 4,000 CarnegieMellon University students who have joined a popular so-cial networking site catered to colleges. We evaluate theamount of information they disclose and study their usageof the site’s privacy settings. We highlight potential attackson various aspects of their privacy, and we show that onlya minimal percentage of users changes the highly permeableprivacy preferences.
Categories and Subject Descriptors
K.4.1 [
Computer and Society
]: Public Policy Issues
Privacy 
General Terms
Human Factors
Keywords
Facebok, Online privacy, information revelation, social net-working sites
1. EVOLUTION OF ONLINENETWORKING
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
WPES’05,
November 7, 2005, Alexandria, Virginia, USA.Copyright 2005 ACM X-XXXXX-XX-X/XX/XX ...
$
5.00.
In recent years online social networking has moved fromniche phenomenon to mass adoption. Although the conceptdates back to the 1960s (with University of Illinois Platocomputer-based education tool, see [16]), viral growth andcommercial interest only arose well after the advent of theInternet.
1
The rapid increase in participation in very recentyears has been accompanied by a progressive diversificationand sophistication of purposes and usage patterns across amultitude of different sites. The Social Software Weblog
2
now groups hundreds of social networking sites in nine cat-egories, including business, common interests, dating, face-to-face facilitation, friends, pets, and photos.While boundaries are blurred, most online networkingsites share a core of features: through the site an individ-ual offers a “profile” - a representation of their sel[ves] (and,often, of their own social networks) - to others to peruse,with the intention of contacting or being contacted by oth-ers, to meet new friends or dates (Friendster,
3
Orkut
4
), findnew jobs (LinkedIn
5
), receive or provide recommendations(Tribe
6
), and much more.It is not unusual for successful social networking sites toexperience periods of viral growth with participation ex-panding at rates topping 20% a month. Liu and Maes es-timate in [18] that “well over a million self-descriptive per-sonal profiles are available across different web-based socialnetworksin the United States, and Leonard, already in2004, reported in [16] that world-wide “[s]even million peo-ple have accounts on Friendster. [...] Two million are regis-tered to MySpace. A whopping 16 million are supposed tohave registered on Tickle for a chance to take a personalitytest.”The success of these sites has attracted the attention of the media (e.g., [23], [3], [16], [4], [26]) and researchers. Thelatter have often built upon the existing literature on socialnetwork theory (e.g., [20], [21], [11], [12], [32]) to discuss
1
One of the first networking sites, SixDegrees.com, waslaunched in 1997 but shut down in 2000 after “strugglingto find a purpose for [its] concept” [5].
2
Http://www.socialsoftware.weblogsinc.com/
.
3
Http://www.friendster.com/
.
4
Http://www.orkut.com/
.
5
Http://www.linkedin.com/
.
6
Http://www.tribe.net/
.
 
its online incarnations. In particular, [7] discusses issues of trust and intimacy in online networking; [9] and [8] focuson participants’ strategic representation of their selves toothers; and [18] focus on harvesting online social networkprofiles to obtain a distributed recommender system.In this paper, we focus on patterns of personal informationrevelation and privacy implications associated with onlinenetworking. Not only are the participation rates to onlinesocial networking staggering among certain demographics;so, also, are the amount and type of information participantsfreely reveal. Category-based representations of a person’sbroad interests are a recurrent feature across most network-ing sites [18]. Such categories may include indications of aperson’s literary or entertainment interests, as well as po-litical and sexual ones. In addition, personally identifiedor identifiable data (as well as contact information) are of-ten provided, together with intimate portraits of a person’ssocial or inner life.Such apparent openness to reveal personal information tovast networks of loosely defined acquaintances and completestrangers calls for attention. We investigate information rev-elation behavior in online networking using actual field dataabout the usage and the inferred privacy preferences of morethan 4,000 users of a site catered to college students, theFacebook.
7
Our results provide a preliminary but detailedpicture of personal information revelation and privacy con-cerns (or lack thereof) in the wild, rather than as discernedthrough surveys and laboratory experiments.The remainder of this paper is organized as follows. Wefirst elaborate on information revelation issues in online so-cial networking in Section 2. Next, we present the resultsof our data gathering in Section 3. Then, we discuss theirimplications in terms of users attitudes and privacy risks inSection 4. Finally, we summarize our findings and concludein Section 5.
2. INFORMATIONREVELATIONANDON-LINE SOCIAL NETWORKING
While social networking sites share the basic purpose of online interaction and communication, specific goals andpatterns of usage vary significantly across different services.The most common model is based on the presentation of theparticipant’s profile and the visualization of her network of relations to others - such is the case of Friendster. Thismodel can stretch towards different directions. In match-making sites, like Match.com
8
or Nerve
9
and Salon
10
Per-sonals, the profile is critical and the network of relationsis absent. In diary/online journal sites like LiveJournal,
11
profiles become secondary, networks may or may not be vis-ible, while participants’ online journal entries take a centralrole. Online social networking thus can morph into onlineclassified in one direction and blogging in another.Patterns of personal information revelation are, therefore,quite variable.First, the pretense of identifiability changes across differ-ent types of sites. The use of real names to (re)presentan account profile to the rest of the online community may
7
Http://www.facebook.com/
.
8
Http://www.match.com/
.
9
Http://personals.nerve.com/
.
10
Http://personals.salon.com/
.
11
Http://www.livejournal.com/
.be
encouraged 
(through technical specifications, registrationrequirements, or social norms) in college websites like theFacebook, that aspire to connect participants’ profiles totheir public identities. The use of real names may be toler-ated but filtered in dating/connecting sites like Friendster,that create a thin shield of weak pseudonymity between thepublic identity of a person and her online persona by mak-ing only the first name of a participant visible to others,and not her last name. Or, the use of real names and per-sonal contact information could be openly
discouraged 
, as inpseudonymous-based dating websites like Match.com, thatattempt to protect the public identity of a person by makingits linkage to the online persona more difficult. However,notwithstanding the different approaches to identifiability,most sites encourage the publication of personal and iden-tifiable personal photos (such as clear shots of a person’sface).Second, the type of information revealed or elicited of-ten orbits around hobbies and interests, but can stride fromthere in different directions. These include: semi-public in-formation such as current and previous schools and employ-ers (as in Friendster); private information such as drinkingand drug habits and sexual preferences and orientation (asin Nerve Personals); and open-ended entries (as in LiveJour-nal).Third, visibility of information is highly variable. In cer-tain sites (especially the ostensibly pseudonymous ones) anymember may view any other member’s profile. On weaker-pseudonym sites, access to personal information may be lim-ited to participants that are part of the direct or extendednetwork of the profile owner. Such visibility tuning controlsbecome even more refined on sites which make no pretenseof pseudonymity, like the Facebook.And yet, across different sites, anecdotal evidence suggeststhat participants are happy to disclose as much informationas possible to as many people as possible. It is not unusualto find profiles on sites like Friendster or Salon Personalsthat list their owners’ personal email addresses (or link totheir personal websites), in violation of the recommendationor requirements of the hosting service itself. In the next sub-section, we resort to the theory of social networks to framethe analysis of such behavior, which we then investigate em-pirically in Section 3.
2.1 Social Network Theory and Privacy
The relation between privacy and a person’s social net-work is multi-faceted. In certain occasions we want infor-mation about ourselves to be known only by a small circleof close friends, and not by strangers. In other instances,we are willing to reveal personal information to anonymousstrangers, but not to those who know us better.Social network theorists have discussed the relevance of relations of different depth and strength in a person’s so-cial network (see [11], [12]) and the importance of so-calledweak ties in the flow of information across different nodesin a network. Network theory has also been used to explorehow distant nodes can get interconnected through relativelyfew random ties (e.g., [20], [21], [32]). The privacy rele-vance of these arguments has recently been highlighted byStrahilevitz in [27].Strahilevitz has proposed applying formal social networktheory as a tool for aiding interpretation of privacy in legalcases. He suggests basing conclusions regarding privacy “on
 
what the parties should have expected to follow the initialdisclosure of information by someone other than the defen-dant” (
op cit 
, p. 57). In other words, the considerationof how information is expected to flow from node to nodein somebody’s social network should also inform that per-son’s expectations for privacy of information revealed in thenetwork.However, the application of social network theory to thestudy of information revelation (and, implicitly, privacy choices)in online social networks highlights significant differences be-tween the offline and the online scenarios.First, offline social networks are made of ties that can onlybe loosely categorized as weak or strong ties, but in realityare extremely diverse in terms of how close and intimate asubject perceives a relation to be. Online social networks,on the other side, often reduce these nuanced connectionsto simplistic binary relations: “Friend or not” [8]. Observ-ing online social networks, Danah Boyd notes that “thereis no way to determine what metric was used or what therole or weight of the relationship is. While some people arewilling to indicate anyone as Friends, and others stick to aconservative definition, most users tend to list anyone whothey know and do not actively dislike. This often meansthat people are indicated as Friends even though the userdoes not particularly know or trust the person” [8] (p. 2).Second, while the number of strong ties that a personmay maintain on a social networking site may not be sig-nificantly increased by online networking technology, Do-nath and Boyd note that “the number of weak ties one canform and maintain may be able to increase substantially,because the type of communication that can be done morecheaply and easily with new technology is well suited forthese ties” [9] (p. 80).Third, while an offline social network may include up toa dozen of intimate or significant ties and 1000 to 1700 “ac-quaintances” or “interactions” (see [9] and [27]), an onlinesocial networks can list hundreds of direct “friends” and in-clude hundreds of thousands of additional friends within justthree degrees of separation from a subject.This implies online social networks are both vaster andhave more weaker ties, on average, than offline social net-works. In other words, thousands of users may be classifiedas friends of friends of an individual and become able toaccess her personal information, while, at the same time,the threshold to qualify as friend on somebody’s networkis low. This may make the online social network only animaginary (or, to borrow Anderson’s terminology, an
imag-ined 
) community (see [2]). Hence, trust in and within onlinesocial networks may be assigned differently and have a dif-ferent meaning than in their offline counterparts. Onlinesocial networks are also more levelled, in that the same in-formation is provided to larger amounts of friends connectedto the subject through ties of different strength. And herelies a paradox. While privacy may be considered conduciveto and necessary for intimacy (for [10], intimacy resides inselectively revealing private information to certain individu-als, but not to others), trust may decrease within an onlinesocial network. At the same time, a new form of intimacybecomes widespread: the sharing of personal informationwith large and potential unknown numbers of friends andstrangers altogether. The ability to meaningfully interactwith others is mildly augmented, while the ability of othersto access the person is significantly enlarged. It remains tobe investigated how similar or different are the mental mod-els people apply to personal information revelation withina traditional network of friends compared to those that areapplied in an online network.
2.2 Privacy Implications
Privacy implications associated with online social network-ing depend on the level of identifiability of the informationprovided, its possible recipients, and its possible uses. Evensocial networking websites that do not openly expose theirusers’ identities may provide enough information to identifythe profile’s owner. This may happen, for example, throughface re-identification [13]. Liu and Maes estimate in [18] a15% overlap in 2 of the major social networking sites theystudied. Since users often re-use the same or similar photosacross different sites, an identified face can be used to iden-tify a pseudonym profile with the same or similar face onanother site. Similar re-identifications are possible throughdemographic data, but also through category-based repre-sentations of interests that reveal unique or rare overlaps of hobbies or tastes. We note that information revelation canwork in two ways: by allowing another party to identify apseudonymous profile through previous knowledge of a sub- ject’s characteristics or traits; or by allowing another partyto infer previously unknown characteristics or traits about asubject identified on a certain site. We present evaluationsof the probabilities of success of these attacks on users of aspecific networking site in Section 4.To whom may identifiable information be made available?First of all, of course, the hosting site, that may use andextend the information (both knowingly and unknowinglyrevealed by the participant) in different ways (below we dis-cuss extracts from the privacy policy of a social networkingsite that are relevant to this discussion). Obviously, theinformation is available within the network itself, whose ex-tension in time (that is, data durability) and space (that is,membership extension) may not be fully known or know-able by the participant. Finally, the easiness of joining andextending one’s network, and the lack of basic security mea-sures (such as SSL logins) at most networking sites make iteasy for third parties (from hackers to government agencies)to access participants data without the site’s direct collabo-ration (already in 2003, LiveJournal used to receive at leastfive reports of ID hijacking per day, [23]).How can that information be used? It depends on theinformation actually provided - which may, in certain cases,be very extensive and intimate. Risks range from identitytheft to online and physical stalking; from embarrassmentto price discrimination and blackmailing. Yet, there aresome who believe that social networking sites can also offerthe solution to online privacy problems. In an interview,Tribe.net CEO Mark Pincus noted that “[s]ocial networkinghas the potential to create an intelligent order in the currentchaos by letting you manage how public you make yourself and why and who can contact you.” [4]. We test this positionin Section 4.While privacy may be at risk in social networking sites,information is willingly provided. Different factors are likelyto drive information revelation in online social networks.The list includes signalling (as discussed in [9]), because theperceived benefit of selectively revealing data to strangersmay appear larger than the perceived costs of possible pri-vacy invasions; peer pressure and herding behavior; relaxed

Share & Embed

More from this user

Add a Comment

Characters: ...