Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Who Says What to Whom on Twitter

Who Says What to Whom on Twitter

Ratings: (0)|Views: 30 |Likes:
Published by Emilio Notareschi
We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter---known as Twitter lists---to distinguish between elite users, by which we mean specifically celebrities, bloggers, and representatives of media outlets and other formal organizations, and ordinary users. Based on this classification, we find a striking concentration of attention on Twitter---roughly 50% of tweets consumed are generated by just 20K elite users---where the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical ``two-step flow'' theory of communications, finding considerable support for it on Twitter, but also some interesting differences. Third, we find that URLs broadcast by different categories of users or containing different types of content exhibit systematically different lifespans. And finally, we examine the attention paid by the different user categories to different news topics.

Authors: Wu, S.; Hofman, J.M.; Mason, W.A.; Watts, D.J. - 2011
We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter---known as Twitter lists---to distinguish between elite users, by which we mean specifically celebrities, bloggers, and representatives of media outlets and other formal organizations, and ordinary users. Based on this classification, we find a striking concentration of attention on Twitter---roughly 50% of tweets consumed are generated by just 20K elite users---where the media produces the most information, but celebrities are the most followed. We also find significant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical ``two-step flow'' theory of communications, finding considerable support for it on Twitter, but also some interesting differences. Third, we find that URLs broadcast by different categories of users or containing different types of content exhibit systematically different lifespans. And finally, we examine the attention paid by the different user categories to different news topics.

Authors: Wu, S.; Hofman, J.M.; Mason, W.A.; Watts, D.J. - 2011

More info:

Published by: Emilio Notareschi on Apr 30, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

04/30/2011

pdf

text

original

 
Who Says What to Whom on Twitter
Shaomei Wu
Cornell University, USA
sw475@cornell.eduJake M. Hofman
Yahoo! Research, NY, USA
hofman@yahoo-inc.comWinter A. Mason
Yahoo! Research, NY, USA
winteram@yahoo-inc.comDuncan J. Watts
Yahoo! Research, NY, USA
djw@yahoo-inc.com
ABSTRACT
We study several longstanding questions in media communi-cations research, in the context of the microblogging serviceTwitter, regarding the production, flow, and consumption of information. To do so, we exploit a recently introduced fea-ture of Twitter known as“lists”to distinguish between eliteusers—by which we mean celebrities, bloggers, and represen-tatives of media outlets and other formal organizations—andordinary users. Based on this classification, we find a strik-ing concentration of attention on Twitter, in that roughly50% of URLs consumed are generated by just 20K eliteusers, where the media produces the most information, butcelebrities are the most followed. We also find significanthomophily within categories: celebrities listen to celebrities,while bloggers listen to bloggers etc; however, bloggers ingeneral rebroadcast more information than the other cate-gories. Next we re-examine the classical“two-step flowthe-ory of communications, finding considerable support for iton Twitter. Third, we find that URLs broadcast by differentcategories of users or containing different types of contentexhibit systematically different lifespans. And finally, we ex-amine the attention paid by the different user categories todifferent news topics.
Categories and Subject Descriptors
H.1.2 [
Models and Principles
]: User/Machine Systems;J.4 [
Social and Behavioral Sciences
]: Sociology
General Terms
two-step flow, communications, classification
Keywords
Communication networks, Twitter, information flow
1. INTRODUCTION
A longstanding objective of media communications re-search is encapsulated by what is known as Lasswell’s maxim:
Part of this research was performed while the author wasvisiting Yahoo! Research, New York. The author was alsosupported by NSF grant IIS-0910664.
Copyright is held by the International World Wide Web Conference Com-mittee (IW3C2). Distribution of these papers is limited to classroom use,and personal use by others.
WWW 2011
, March 28–April 1, 2011, Hyderabad, India.ACM 978-1-4503-0637-9/11/03.
“who says what to whom in what channel with what ef-fect” [12], so-named for one of the pioneers of the field,Harold Lasswell. Although simple to state, Laswell’s maximhas proven difficult to answer in the more-than 60 yearssince he stated it, in part because it is generally difficult toobserve information flows in large populations, and in partbecause different channels have very different attributes andeffects. As a result, theories of communications have tendedto focus either on “mass” communication, defined as “one-way message transmissions from one source to a large, rela-tively undifferentiated and anonymous audience,”or on“in-terpersonal” communication, meaning a “two-way messageexchange between two or more individuals.”[16].Correspondingly, debates among communication theoristshave tended to revolve around the relative importance of these two putative modes of communication. For exam-ple, whereas early theories such as the“hypodermic needle”model posited that mass media exerted direct and relativelystrong effects on public opinion, mid-century researchers [13,9, 14, 4] argued that the mass media influenced the pub-lic only indirectly, via what they called a two-step flow of communications, where the critical intermediate layer wasoccupied by a category of media-savvy individuals called
opinion leaders
. The resulting “limited effects” paradigmwas then subsequently challenged by a new generation of researchers [6], who claimed that the real importance of themass media lay in its ability to set the agenda of publicdiscourse. But in recent years rising public skepticism of mass media, along with changes in media and communica-tion technology, have tilted conventional academic wisdomonce more in favor of interpersonal communication, whichsome identify as a “new era”of minimal effects [2].Recent changes in technology, however, have increasinglyundermined the validity of the mass vs. interpersonal di-chotomy itself. On the one hand, over the past few decadesmass communication has experienced a proliferation of newchannels, including cable television, satellite radio, special-ist book and magazine publishers, and of course an arrayof web-based media such as sponsored blogs, online com-munities, and social news sites. Correspondingly, the tra-ditional mass audience once associated with, say, networktelevision has fragmented into many smaller audiences, eachof which increasingly selects the information to which it isexposed, and in some cases generates the information it-self [15]. Meanwhile, in the opposite direction interpersonalcommunication has become increasingly amplified throughpersonal blogs, email lists, and social networking sites to
 
afford individuals ever-larger audiences. Together, thesetwo trends have greatly obscured the historical distinctionbetween mass and interpersonal communications, leadingsome scholars to refer instead to “masspersonal” communi-cations [16].A striking illustration of this erosion of traditional me-dia categories is provided by the micro-blogging platformTwitter. For example, the top ten most-followed users onTwitter are not corporations or media organizations, butindividual people, mostly celebrities. Moreover, these indi-viduals communicate directly with their millions of followersvia their tweets, often managed by themselves or publicists,thus bypassing the traditional intermediation of the massmedia between celebrities and fans. Next, in addition toconventional celebrities, a new class of“semi-public”individ-uals like bloggers, authors, journalists, and subject matterexperts has come to occupy an important niche on Twit-ter, in some cases becoming more prominent (at least interms of number of followers) than traditional public figuressuch as entertainers and elected officials. Third, in spiteof these shifts away from centralized media power, mediaorganizations—along with corporations, governments, andNGOs—all remain well represented among highly followedusers, and are often extremely active. And finally, Twitteris primarily made up of many millions of users who seemto be ordinary individuals communicating with their friendsand acquaintances in a manner largely consistent with tra-ditional notions of interpersonal communication.Twitter, therefore, represents the full spectrum of commu-nications from personal and private to“masspersonal”to tra-ditional mass media. Consequently it provides an interestingcontext in which to address Lasswell’s maxim, especially asTwitter—unlike television, radio, and print media—enablesone to easily observe information flows among the membersof its ecosystem. Unfortunately, however, the kinds of ef-fects that are of most interest to communications theorists,such as changes in behavior, attitudes, etc., remain difficultto measure on Twitter. Therefore in this paper we limitour focus to the“who says what to whom”part of Laswell’smaxim.To this end, our paper makes three main contributions:
We introduce a method for classifying users using Twit-ter Lists into“eliteand“ordinary”users, further clas-sifying elite users into one of four categories of interest—media, celebrities, organizations, and bloggers.
We investigate the flow of information among thesecategories, finding that although audience attention ishighly concentrated on a minority of elite users, muchof the information they produce reaches the massesindirectly via a large population of intermediaries.
We find that different categories of users emphasize dif-ferent types of content, and that different content typesexhibit dramatically different characteristic lifespans,ranging from less than a day to months.The remainder of the paper proceeds as follows. In thenext section, we review related work. In Section 3 we dis-cuss our data and methods, including Section 3.3 in whichwe describe how we use Twitter Lists to classify users, out-line two different sampling methods, and show that theydeliver qualitatively similar results. In Section 4 we ana-lyze the production of information on Twitter, particularlywho pays attention to whom. In section 4.1, we revisit thetheory of the two-step flow—arguably the dominant theoryof communications for much of the past 50 years—findingconsiderable support for the theory. In Section 5, we con-sider“who listens to what”, examining first who shares whatkinds of media content, and second the lifespan of URLs as afunction of their origin and their content. Finally, in Section6 we conclude with a brief discussion of future work.
2. RELATED WORK
Aside from the communications literature surveyed above,a number of recent papers have examined information dif-fusion on Twitter. Kwak et al. [11] studied the topologicalfeatures of the Twitter follower graph, concluding from thehighly skewed nature of the distribution of followers and thelow rate of reciprocated ties that Twitter more closely resem-bled an information sharing network than a social network—a conclusion that is consistent with our own view. In ad-dition, Kwak et al. compared three different measures of influence—number of followers, page-rank, and number of retweets—finding that the ranking of the most influentialusers differed depending on the measure. In a similar vein,Cha et al. [3] compared three measures of influence—numberof followers, number of retweets, and number of mentions—and also found that the most followed users did not neces-sarily score highest on the other measures. Weng et al. [17]compared number of followers and page rank with a modifiedpage-rank measure which accounted for topic, again findingthat ranking depended on the influence measure. Finally,Bakshy et al. [1] studied the distribution of retweet cascadeson Twitter, finding that although users with large followercounts and past success in triggering cascades were on aver-age more likely to trigger large cascades in the future, thesefeatures are in general poor predictors of future cascade size.Our paper differs from this earlier work by shifting atten-tion from the ranking of individual users in terms of variousinfluence measures to the flow of information among differ-ent categories of users. In this sense, it is related to recentwork by Crane and Sornette [5], who posited a mathemati-cal model of social influence to account for observed tempo-ral patterns in the popularity of YouTube videos, and alsoto Gomez et al [7], who studied the diffusion of informa-tion among blogs and online news sources. Here, however,our focus is on identifying specific categories of“elite”users,who we differentiate from “ordinary” users in terms of theirvisibility, and understanding their role in introducing infor-mation into Twitter, as well as how information originatingfrom traditional media sources reaches the masses.
3. DATA AND METHODS3.1 Twitter Follower Graph
In order to understand how information is transmitted onTwitter, we need to know the channels by which it flows;that is, who is following whom on Twitter. To this end, weused the follower graph studied by Kwak et al. [11], whichincluded 42M users and 1.5B edges. This data representsa crawl of the graph seeded with all users on Twitter asobserved by July 31st, 2009, and is publicly available
1
. Asreported by Kwak et al. [11], the follower graph is a directed
1
The data is free to download fromhttp://an.kaist.ac.kr/traces/WWW2010.html
 
network characterized by highly skewed distributions both of in-degree (# followers) and out-degree (#“friends”, Twitternomenclature for how many others a user follows); however,the out-degree distribution is even more skewed than thein-degree distribution. In both friend and follower distribu-tions, for example, the median is less than 100, but the max-imum # friends is several hundred thousand, while a smallnumber of users have millions of followers. In addition, thefollower graph is also characterized by extremely low reci-procity (roughly 20%)—in particular, the most-followed in-dividuals typically do not follow many others. The Twitterfollower graph, in other words, does not conform to the usualcharacteristics of social networks, which exhibit much higherreciprocity and far less skewed degree distributions [10], butinstead resembles more the mixture of one-way mass com-munications and reciprocated interpersonal communicationsdescribed above.
3.2 Twitter Firehose
In addition to the follower graph, we are interested in thecontent being shared on Twitter, and so we examined thecorpus of all 5B tweets generated over a 223 day period fromJuly 28, 2009 to March 8, 2010 using data from the Twitter“firehose,” the complete stream of all tweets
2
. Because ourobjective is to understand the flow of information, it is use-ful for us to restrict attention to tweets containing URLs,for two reasons. First, URLs add easily identifiable tags toindividual tweets, allowing us to observe when a particularpiece of content is either retweeted or subsequently reintro-duced by another user. And second, because URLs pointto online content outside of Twitter, they provide a muchricher source of variation than is possible in the typical 140character tweet
3
. Finally, we note that almost all URLsbroadcast on Twitter have been shortened using one of anumber of URL shorteners, of which the most popular ishttp://bit.ly/. From the total of 5B tweets recorded duringour observation period, therefore, we focus our attention onthe subset of 260M containing bit.ly URLs; thus all subse-quent counts are implicitly understood to be restricted tothis content.
3.3 Twitter Lists
Our method for classifying users exploits a relatively re-cent feature of Twitter: Twitter Lists. Since its launch onNovember 2, 2009, Twitter Lists have been used extensivelyto group sets of users into topical or other categories, andthereby to better organize and/or filter incoming tweets. Tocreate a Twitter List, a user provides a name (required) anddescription (optional) for the list, and decides whether thenew list is public (anyone can view and subscribe to this list)or private (only the list creator can view or subscribe to thislist). Once a list is created, the user can add/edit/deletelist members. As the purpose of Twitter Lists is to helpusers organize users they follow, the name of the list canbe considered a meaningful label for the listed users. The
2
http://dev.twitter.com/doc/get/statuses/firehose
3
Naturally, this restriction also has downsides, in particularthat some users may be more likely to include URLs in theirtweets than others, and thus will appear to be relativelymore active and/or have more impact than if we were insteadto consider all tweets. For our purposes, however, we believethat the practical advantages of the restriction outweigh thepotential for bias.classification of users can therefore effectively exploit the“wisdom of crowds” with these created lists, both in termsof their importance to the community (number of lists onwhich they appear), and also how they are perceived (e.g.news organization vs. celebrity, etc.).Before describing our methods for classifying users in termsof the lists on which they appear, we emphasize that weare motivated by a particular set of substantive questionsarising out of communications theory. In particular, weare interested in the relative importance of mass commu-nications, as practiced by media and other formal organiza-tions, masspersonal communications as practiced by celebri-ties and prominent bloggers, and interpersonal communica-tions, as practiced by ordinary individuals communicatingwith their friends. In addition, we are interested in the re-lationships between these categories of users, motivated bytheoretical arguments such as the theory of the two-stepflow [9]. Rather than pursuing a strategy of automatic clas-sification, therefore, our approach depends on defining andidentifying certain predetermined classes of theoretical in-terest, where both approaches have advantages and disad-vantages. In particular, we restrict our attention to fourclasses of what we call“eliteusers: media, celebrities, orga-nizations, and bloggers, as well as the relationships betweenthese elite users and the much larger population of “ordi-nary”users.Analytically, our approach has some disadvantages. Inparticular, by determining the categories of interest in ad-vance, we reduce the possibility of discovering unanticipatedcategories that may be of equal or greater relevance thanthose we selected. Thus although we believe that for our par-ticular purposes, the advantages of our approach—namelyconceptual clarity and ease of interpretation—outweigh thedisadvantages, automated classification methods remain aninteresting topic for future work. Finally, in addition tothese theoretically-imposed constraints, our proposed clas-sification method must also satisfy a practical constraint—namely that the rate limits established by Twitter’s APIeffectively preclude crawling all lists for all Twitter users
4
.Thus we instead devised two different sampling schemes—asnowball sample and an activity sample—each with someadvantages and disadvantages, discussed below.
3.3.1 Snowball sample of Twitter Lists
The first method for identifying elite users employed snow-ball sampling. For each category, we chose a number
u
0
of seed users that were highly representative of the desired cat-egory and appeared on many category-related lists. For eachof the four categories above, the following seeds were chosen:
Celebrities
: Barack Obama, Lady Gaga, Paris Hilton
Media
: CNN, New York Times
Organizations
: Amnesty International, World WildlifeFoundation, Yahoo! Inc., Whole Foods
4
The Twitter API allows only 20K calls per hour, where atmost 20 lists can be retrieved for each API call. Under themodest assumption of 40M users, where each user is includedon at most 20 lists, this would require roughly 11 weeks.Clearly this time could be reduced by deploying multipleaccounts, but it also likely underestimates the real time quitesignificantly, as many users appear on many more than 20lists (e.g. Lady Gaga appears on nearly 140,000).

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->