This action might not be possible to undo. Are you sure you want to continue?
Cornell University, USA
Jake M. Hofman email@example.com Duncan J. Watts
Yahoo! Research, NY, USA Yahoo! Research, NY, USA
firstname.lastname@example.org Winter A. Mason
Yahoo! Research, NY, USA
We study several longstanding questions in media communications research, in the context of the microblogging service Twitter, regarding the production, ﬂow, and consumption of information. To do so, we exploit a recently introduced feature of Twitter—known as Twitter lists—to distinguish between elite users, by which we mean speciﬁcally celebrities, bloggers, and representatives of media outlets and other formal organizations, and ordinary users. Based on this classiﬁcation, we ﬁnd a striking concentration of attention on Twitter—roughly 50% of tweets consumed are generated by just 20K elite users—where the media produces the most information, but celebrities are the most followed. We also ﬁnd signiﬁcant homophily within categories: celebrities listen to celebrities, while bloggers listen to bloggers etc; however, bloggers in general rebroadcast more information than the other categories. Next we re-examine the classical “two-step ﬂow” theory of communications, ﬁnding considerable support for it on Twitter, but also some interesting diﬀerences. Third, we ﬁnd that URLs broadcast by diﬀerent categories of users or containing diﬀerent types of content exhibit systematically diﬀerent lifespans. And ﬁnally, we examine the attention paid by the diﬀerent user categories to diﬀerent news topics.
Categories and Subject Descriptors
H.1.2 [Models and Principles]: User/Machine Systems; J.4 [Social and Behavioral Sciences]: Sociology
two-step ﬂow, communications, classiﬁcation
Communication networks, Twitter, information ﬂow
∗ Part of this research was performed while the author was visiting Yahoo! Research, New York.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. WWW ’11 Hyderabad, India Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
A longstanding objective of media communications research is encapsulated by what is known as Lasswell’s maxim: “who says what to whom in what channel with what effect” , so-named for one of the pioneers of the ﬁeld, Harold Lasswell. Although simple to state, Laswell’s maxim has proven diﬃcult to satisfy in the more-than 60 years since he stated it, in part because it is generally diﬃcult to observe information ﬂows in large populations, and in part because diﬀerent channels have very diﬀerent attributes and eﬀects. As a result, theories of communications have tended to focus either on “mass” communication, deﬁned as “oneway message transmissions from one source to a large, relatively undiﬀerentiated and anonymous audience,” or on “interpersonal” communication, meaning a “two-way message exchange between two or more individuals.” . Correspondingly, debates among communication theorists have tended to revolve around the relative importance of these two putative modes of communication. For example, whereas early theories such as the so-called “hypodermic model” posited that mass media exerted direct and relatively strong eﬀects on public opinion, mid-century researchers [10, 6, 11, 4] argued that the mass media inﬂuenced the public only indirectly, via what they called a two-step ﬂow of communications, where the critical intermediate layer was occupied by a category of media-savvy individuals called opinion leaders. The resulting “limited eﬀects” paradigm was then subsequently challenged by a new generation of researchers , who claimed that the real importance of the mass media lay in its ability to set the agenda of public discourse. But in recent years rising public skepticism of mass media, along with changes in media and communication technology, have tilted conventional academic wisdom once more in favor of interpersonal communication, which some identify as a “new era” of minimal eﬀects . Recent changes in technology, however, have increasingly undermined the validity of the mass vs. interpersonal dichotomy itself. On the one hand, over the past few decades mass communication has experienced a proliferation of new channels, including cable television, satellite radio, specialist book and magazine publishers, and of course an array of web-based media such as sponsored blogs, online communities, and social news sites. Correspondingly, the traditional mass audience once associated with, say, network television has fragmented into many smaller audiences, each of which increasingly selects the information to which it is exposed, and in some cases generates the information itself. Mean-
as well as how information originating from traditional media sources reaches the masses. As reported by Kwak et al. and is publicly available1 . and are often extremely active. and show that they deliver qualitatively similar results. who is following whom on Twitter. Kwak et al. further classifying elite users into one of four categories of interest— media. • We investigate the ﬂow of information among these categories. that is. represents the full spectrum of communications from personal and private to “masspersonal” to traditional mass media. compared three diﬀerent measures of inﬂuence—number of followers.html from . The remainder of the paper proceeds as follows. Unfortunately. which included 42M users and 1. To this end. Moreover. in addition to conventional celebrities. the kinds of effects that are of most interest to communications theorists.  studied the topological features of the Twitter follower graph. much of the information they produce reaches the masses indirectly via a large population of intermediaries. who we diﬀerentiate from “ordinary” users in terms of their visibility. however. therefore. especially as Twitter—unlike television. number of retweets.kr/traces/WWW2010. Twitter. in spite of these shifts away from centralized media power. thus bypassing the traditional intermediation of the mass media between celebrities and fans. such as changes in behavior. In section 5. and number of retweets—ﬁnding that the ranking of the most inﬂuential users diﬀered depending on the measure. and print media—enables one to easily observe information ﬂows among the members of its ecosystem. these features are in general poor predictors of future cascade size. we need to know the channels by which it ﬂows. and understanding their role in introducing information into Twitter. the follower graph is a directed network characterized by highly skewed distributions both 1 The data is free to download http://an. in some cases becoming more prominent than traditional public ﬁgures such as entertainers and elected oﬃcials. leading some scholars to refer instead to “masspersonal” communications . 3.  studied the distribution of retweet cascades on Twitter. To illustrate. organizations. email lists. attitudes. Twitter is primarily made up of many millions of users who seem to be ordinary individuals communicating with their friends and acquaintances in a manner largely consistent with traditional notions of interpersonal communication. Therefore in this paper we limit our focus to the “who says what to whom” part of Laswell’s maxim. we review related work. media organizations—along with corporations. in the opposite direction interpersonal communication has become increasingly ampliﬁed through personal blogs. Nowhere is the erosion of traditional categories more apparent than in the micro-blogging platform Twitter. etc. and second the lifespan of URLs as a function of their origin and their content. 2.while. remain diﬃcult to measure on Twitter. Next. including section 3. page-rank. we used the follower graph studied by Kwak et al. governments. . and subject matter experts have come to occupy an important niche on Twitter. and bloggers. In section 4.1 DATA AND METHODS Twitter Follower Graph In order to understand how information is transmitted on Twitter. 3. and social networking sites to aﬀord individuals ever-larger audiences. Kwak et al. Cha et al. Third. Together.1. In section 3 we discuss our data and methods. Weng et al. celebrities. Finally. • We ﬁnd that diﬀerent categories of users place slightly diﬀerent emphasis on diﬀerent types of content. we revisit the theory of the two-step ﬂow—arguably the dominant theory of communications for much of the past 50 years—ﬁnding considerable support for the theory as well as some interesting diﬀerences. To this end. ﬁnding that although audience attention is highly concentrated on a minority of elite users. these two trends have greatly obscured the historical distinction between mass and interpersonal communications. Finally.kaist. 2009.ac. In a similar vein. ranging from less than a day to months. the top ten most-followed users on Twitter are not corporations or media organizations. . we consider “who listens to what”. And ﬁnally. authors.  compared three measures of inﬂuence—number of followers. particularly who pays attention to whom. concluding from the highly skewed nature of the distribution of followers and the low rate of reciprocated ties that Twitter more closely resembled an information sharing network than a social network— a conclusion that is consistent with our own view. and that diﬀerent content types exhibit dramatically different characteristic lifespans. RELATED WORK Aside from the communications literature surveyed above. outline two diﬀerent sampling methods. In the next section. we are interested in identifying “elite” users. our paper makes three main contributions: • We introduce a method for classifying users using Twitter Lists into “elite” and “ordinary” users. radio. In section 4 we analyze the production of information on Twitter. a new class of “semi-public” individuals like bloggers. a number of recent papers have examined information diffusion on Twitter. journalists. in section 6 we conclude with a brief discussion of future work. This data represents a crawl of the graph seeded with all users on Twitter as observed by July 31st. and number of mentions— and also found that the most followed users did not necessarily score highest on the other measures.. again ﬁnding that ranking depended on the inﬂuence measure. ﬁnding that although users with large follower counts and past success in triggering cascades were on average more likely to trigger large cascades in the future.3 in which we describe how we use Twitter Lists to classify users. and NGOs—all remain well represented among highly followed users. In addition. examining ﬁrst who shares what kinds of media content. In particular. Consequently it provides an interesting context in which to address Lasswell’s maxim.5B edges. these individuals communicate directly with their millions of followers. but individual people. Our paper diﬀers from this earlier work by shifting attention from the ranking of individual users in terms of various inﬂuence measures to the ﬂow of information among different categories of users. Bakshy et al.  compared number of followers and page rank with a modiﬁed page-rank measure which accounted for topic. often managed by themselves or publicists. mostly celebrities.
2009 to March 8. it is useful for us to restrict attention to tweets containing URLs.3 Twitter Lists Our method for classifying users exploits a relatively recent feature of Twitter: Twitter Lists. In particular. NotAProBlog. kikolani. for two reasons. 2010 using data from the Twitter “ﬁrehose. In particular. the median is less than 100. Because our objective is to understand the ﬂow of information. jsinkeywest. seosmarty. problogger. our approach depends on deﬁning and identifying certain predetermined classes of theoretical interest. we are interested in the content being shared on Twitter—particularly URLs—and so we examined the corpus of all 5B tweets generated over a 223 day period from July 28.g. wchingya. and also how they are perceived (e. Rather than pursuing a strategy of automatic classiﬁcation. In both friend and follower distributions. Yahoo! Inc. celebrities. Finally. etc. in other words. danielscocco. or 11 weeks. TycoonBlogger. Lady Gaga appears on nearly 140. both in terms of their importance to the community (number of lists on which they appear). New York Times • Organizations: Amnesty International. JimiJones. therefore. where at most 20 lists can be retrieved for each API call. ditesco After reviewing the lists associated with these seeds. In addition. motivated by theoretical arguments such as the theory of the two-step ﬂow . we chose a number u0 of seed users that were highly representative of the desired category and appeared on many category-related lists. The Twitter follower graph. does not conform to the usual characteristics of social networks. In addition. GrowMap. For each of the four categories above. As the purpose of Twitter Lists is to help users organize users they follow. BlazingMinds. From the total of 5B tweets recorded during our observation period. we emphasize that we are motivated by a particular set of substantive questions arising out of communications theory. but the maximum # friends is several hundred thousand. Before describing our methods for classifying users in terms of the lists on which they appear. List creation therefore eﬀectively exploits the “wisdom of crowds”  to the task of classifying users. and interpersonal communications. the most-followed individuals typically do not follow many others. we focus our attention on the subset of 260M containing bit. 000 hours. remarkablogger. Under the modest assumption of 40M users (roughly the number in the 2009 crawl by ). bloggersblog. discussed below. because URLs point to online content outside of Twitter. the following keywords were hand-selected based on (a) their representativeness of the desired categories. And second. but instead resembles more the mixture of one-way mass communications and reciprocated interpersonal communications described above.). shoemoney. Once a list is created. as practiced by ordinary individuals communicating with their friends. Clearly this time could be reduced by deploying multiple accounts. Thus we instead devised two diﬀerent sampling schemes—a snowball sample and an activity sample— each with some advantages and disadvantages. brandonacox. they provide a much richer source of variation than is possible in the typical 140 character tweet. we 2 http://dev. In additional to these theoretically-imposed constraints. engadget. the out-degree distribution is even more skewed than the in-degree distribution. news organization vs. as many users appear on many more than 20 lists (e. which exhibit much higher reciprocity and far less skewed degree distributions . and decide whether the new list is public (anyone can view and subscribe to this list) or private (only the list creator can view or subscribe to this list). but it also likely underestimates the real time quite signiﬁcantly. this would require 4 ∗ 106 /2 ∗ 103 = 2.ly/. Twitter Lists have been welcomed by the community as a way to group people and organize one’s incoming stream of tweets by speciﬁc sets of users. we note that almost all URLs broadcast on Twitter have been shortened using one of a number of URL shorteners. extremejohn. bbrian017. we restrict our attention to four classes of what we call “elite” users: media. 2009. Element321.. copyblogger.1 Snowball sample of Twitter Lists The ﬁrst method for identifying elite users employed snowball sampling. First. are interested in the relative importance of mass communications. FamousBloggers. smartbloggerz.” the complete stream of all tweets2 . Twitter notation for how many others a user follows). the follower graph is also characterized by extremely low reciprocity (roughly 20%)—in particular. Since its launch on November 2. Ileane.000) 4 The blogger category required many more seeds because bloggers are in general lower proﬁle than the seeds for the other categories 3 3. the name of the list can be considered a meaningful label for the listed users. where both approaches have advantages and disadvantages. Chrisbrogan. virtuosoblogger. dragonblogger. our proposed classiﬁcation method must also satisfy a practical constraint—namely that the rate limits established by Twitter’s API eﬀectively preclude crawling all lists for all Twitter users3 . URLs add easily identiﬁable tags to individual tweets. Paris Hilton • Media: CNN. World Wildlife Foundation. celebrity. Gizmodo. Lady Gaga. a user needs to provide a name (required) and description (optional) for the list. 3. mashable. therefore. Whole Foods • Blogs4 : BoingBoing.2 Twitter Firehose In addition to the follower graph. for example. as well as the relationships between these elite users and the much larger population of “ordinary” users. hishaman. however.g. we are also interested in the relationships between these categories of users. of which the most popular is http://bit. allowing us to observe when a particular piece of content is either retweeted or subsequently reintroduced by another user. 3. and bloggers. the following seeds were chosen: • Celebrities: Barack Obama.3. kbloemendaal.ly URLs. For each category.twitter. while a small number of users have millions of followers. the user can add/edit/delete list members. as practiced by media and other formal organizations.of in-degree (# followers) and out-degree (# “friends”. To create a Twitter List.com/doc/get/statuses/ﬁrehose . where each user is included on at most 20 lists. and (b) their lack of overlap between categories: The Twitter API allows only 20K calls per hour. masspersonal communications as practiced by celebrities and prominent bloggers. organizations.
we obtained a much-reduced sample of 113. The resulting “list of lists” was then pruned to contain only the l0 lists whose names matched at least one of the chosen keywords for that category. also appear in the snowball sample. wic = Nc where nic is the number of lists in category c that contain user i and Nc is the total number of lists in category c.2% 97. however. celebs. companies.3 Classifying Elite Users 3. the bias is likely to be quite diﬀerent from any introduced by the snowball sample.685 100% Figure 1: Method Schematic of the Snowball Sampling • Celebrities: star. celebrity. In particular. cause. Lady Gaga is on lists called “faves”. we crawled all lists on which that seed appeared.3% 14. celebsveriﬁed.830 38. To address this concern. This method initially yielded 750k users and 5M lists. corporation.3. In order to identify categories of elite users. 116 users were obtained. we chose k = 5000 as a cut-oﬀ for the elite categories. followed by the media. so as to facilitate comparisons. blogs. 97. ngo • Blogs: blog. Figure 2(a) shows for the snowball sample the share of following links (square symbols) and tweets received (diamonds) by an average user. for both sampling methods. who appeared on 7.e. Also in both cases. however.celebrities-ontwitter. We note that the number of lists obtained by the activity sampling methods is considerably smaller than that obtained by the snowball sample. Although the numerical values diﬀer slightly.6% 113. however. This “activity-based” sampling method is also clearly biased towards users who are consistently active. charity. In total. celebrity-list. many of the more prominent users appeared on lists in more than one category—for example Oprah Winfrey is frequently included in lists of “celebrity” as well as “media.891 13. products. 000 lists. as evidenced by the relatively ﬂat slope of the attention curves. stars.010 41.0% 40. where all remaining users are hence- . bloggers Having selected the seeds and the keywords for each category.483 24. and bloggers. but also arrive at a definition of “elite” that satisﬁes a tradeoﬀ between (a) keeping each category relatively small. ﬁnding all users that appeared in the “celebrity” list with Lady Gaga). Balancing the requirements described above.116 100% Activity Sample # of users % of users 14. “celebs”. and then repeated these last two steps to complete the crawl. celebrity-tweets • Media: news. We then assigned each user to the category in which he or she has the highest membership score. organisations.1% 43. and assigning users to unique categories (as described above). the bulk of the attention is accounted for by a relatively small number of users within each category. suggesting that the two sampling methods identifying similar populations of elite users–as indeed we conﬁrm in the next section. where we note that the curve for celebrities asymptotes more slowly than for the other three categories. charities. and the proportion of tweets the user received from everyone the user follows in each category. after pruning the lists to those that contained at least of the keywords above.778 13. we measure the ﬂow of information from the top k users in each of the four categories to a random sample of 100K ordinary (i.u0 l0 u1 l1 u2 l2 Table 1: Distribution of users over categories category celeb media org blog total Snowball Sample # of users % of users 82. we also generate a sample of users based on their activity. In addition. thus obtaining similar results from the two samples should give us conﬁdence that our ﬁndings are not artifacts of the sampling procedure.685 users. For instance. 524. Speciﬁcally. where Table 1 reports the number of users assigned to each category. To this end.7% 127. but only the latter two lists would be kept after pruning. causes. For each seed. Next.614 of the activity sample. blogger. therefore.3. we computed a user i’s membership score in category c: nic . We then crawled all u1 users appearing in the pruned “list of lists” (for instance. 3.” To resolve this ambiguity. Interestingly. or 85%. news-media • Organizations: company. the two sets of results are qualitatively similar. celebrities outrank all other categories. and “celebrity”.8% 216. while Figure 2(b) shows the same information for the activity sample. we then performed a snowball sample of the bipartite graph of users and lists (see Figure 1). we crawl all lists associated with all users who tweet at least once every week for our entire observation period. and that bloggers are more heavily represented among the activity sample at the expense of the other three categories— consistent with our claim that the two methods introduce diﬀerent biases. Importantly. brands.2 Activity Sample of Twitter Lists Although the snowball sampling method is convenient and is easily interpretable with respect to our theoretical motivation.853 18. organization. organizations. it is also potentially biased by our particular choice of seeds. The number of users assigned in this manner to each category is reported in Table 1. celebrities. however. we ﬁrst rank all users in each of category by how frequently they are listed in that category. it is also desirable to make the four categories the same size. unclassiﬁed) users in two ways: the proportion of people the user follows in each category. while (b) maximizing the volume of attention that is accounted for by each category. we not only need to classify users into categories.770 15. organisation.3% 524. media. organizations. 000. hollywood. so as not to include users who are not distinguishable from ordinary users.186 35.
In the media category. compared with over 1. Table 3: Top 5 users in each category Celebrity Media Org Blog aplusk cnnbrk google mashable ladygaga nytimes Starbucks problogger TheEllenShow asahi twitter kibeloco taylorswift13 BreakingNews joinred naosalvo Oprah TIME ollehkt dooce friends tweets received 0 1000 4000 7000 top k organizations 10000 0 1000 4000 7000 top k blogs 10000 average % 10 20 30 0 1000 4000 7000 top k 10000 0 average % 10 20 30 friends tweets received friends tweets received 1000 4000 7000 top k 10000 4.81 media 5.360. respectively. suggests that the sampling method yields results that are consistent with our objective of identifying users who are prominent exemplars of our target categories. however. Time. it remains the case that 20K elite users. attracts almost 50% of all attention within Twitter.05% of the user population.000 for media users. “aplusk. is that in spite of this fragmentation. and so on. information ﬂows have not become egalitarian by any means. Figure 3 shows the average percentage of tweets that category i receives from category j. while Kibe Loco and Nao Salvo are popular blogs in Brazil. and Twitter are obviously large and socially prominent corporations. comprising less than 0.228.119.364 6.94 org 523. Google. followed by Breaking News.000 users identiﬁed by the sampling method. among the blogging category. formerly Korean Telecom. we refer the top 5K users drawn from the snowball sample listed as “celebrity”. In addition. “blog”. followed by bloggers. Mashable and ProBlogger are both prominent US blogging sites.10 friends tweets received 0 0 1000 4000 7000 top k organizations 10000 1000 4000 7000 top k blogs 10000 average % 10 20 30 0 1000 4000 7000 top k 10000 0 average % 10 20 30 friends tweets received friends tweets received 1000 4000 7000 top k 10000 (a) Snowball sample celebrities average % 10 20 30 average % 10 20 30 media friends tweets received dle for actor Ashton Kusher. In the rest of this paper. Table 3. and ollehkt is the Twitter account for KT.74 blog 1. Table 2 shows that although ordinary users collectively introduce by far the highest number of URLs. from this point on. media actors pay attention to other media actors. Speciﬁcally. Oprah Winfrey. and celebrities. Among organizations. exhibiting striking homophily with respect to attention: celebrities overwhelmingly pay attention to other celebrities. Ellen Degeneres. a leading Japanese daily newspaper. for example. and Asahi.698 104.058 27. members of the elite categories are far more active on a per-capita basis. In particular.131 272. The one slight exception to this rule is that organizations pay more attention to bloggers than to them- . and dooce is the blog of Heather Armstrong. Finally. celebrities average % 10 20 30 average % 10 20 30 media friends tweets received Table 2: # of URLs initiated by category # of URLs category # of URLs per-capita celeb 139. Equally interesting. Among the celebrity list. therefore. one of the ﬁrst celebrities to embrace Twitter and still one of the most followed users. Even if the media has lost attention relative to other elites.forth classiﬁed as ordinary. while JoinRed is the charity organization started by Bono of U2. noting that both methods generate similar results. “organization”.739 1023. ordinary users on Twitter are receiving their information from many thousands of distinct sources. The prominence of elite users also raises the question of how these diﬀerent categories listen to each other. “media”. most of which are not traditional media organizations—even though media outlets are by far the most active users on Twitter.” is the han- The results of the previous section provide qualiﬁed support for the conventional wisdom that audiences have become increasingly fragmented. “media”. Ordinary users originate on average only about 6 URLs each.03 ordinary 244. are all household names. “blog”. “organization”.5M followers. users classiﬁed as “media” easily outproduce all other categories. Starbucks. we restrict our analysis to elite categories to the top 5. a widely read “mommy blogger” with over 1. while the remain celebrity users—Lady Gaga. organizations. Clearly. we compute the volume of tweets exchanged between elite categories. “WHO LISTENS TO WHOM” (b) Activity sample Figure 2: Average fraction of # following (blue line) and # tweets (red line) for a random user that are accounted for by the top K elites users crawled Based on this deﬁnition of elite users. CNN Breaking News and the New York Times are most prominent. To address this issue. and Taylor Swift. which shows the top 5 users in each of the four categories. when we talk about “celebrity”. only about 15% of tweets received by ordinary users are received directly from the media.
we may inquire whether these intermediaries. to the extent they exist. and if the latter. in which case we assume the information has been rediscovered independently. For the purposes of studying when a user receives information directly from the media or indirectly through an intermediary. but without the acknowledgment. we treat retweets and reintroductions equivalently. whether they are citing the source within Twitter by retweeting the URL. that is. then that source can be considered an intermediary. we sampled 1M random ordinary users5 .Celeb Category of Twitter Users A B Media B receive tweets from A Org Blog Figure 3: Share of tweets received among elite categories Figure 4: RT behavior among elite categories selves. and what proportion is transmitted indirectly via some population of intermediaries? In addition. or reintroducing it. A stronger measure of attention. is to consider instead only those URLs introduced by category i that are subsequently retweeted by category j. light on the theory of the two-step ﬂow. even though on a per-capita basis bloggers disproportionately occupy the role of information recyclers—93 retweets per person. If the ﬁrst occurrence of a URL in Twitter came from a media user. Figure 4 shows how much information originating from each category is retweeted by other categories. however. As we have already noted. attention paid by organizations is more evenly distributed across categories than for any other category. the former type is exposed to less total media than the latter. Figure 3. as claimed by the two-step ﬂow theory.46 therefore represents the proportion of mediaoriginated content that reaches the masses via an intermediary rather than directly. However. As with our previous measure of attention. Before proceeding with this analysis. 600K had received at least one such URL. Nevertheless. along with an explicit acknowledgement of the source—either using oﬃcial retweet function provided by Twitter. this average is somewhat misleading. arguably the theory that has most successfully captured the dueling importance of mass media and interpersonal inﬂuence. however. it is still a substantial fraction.1 Two-Step Flow of Information Examining information ﬂow on Twitter can also shed new . as supposed by early theories of mass communication. Unsurprisingly. compared to only 1. performing this analysis for the entire population of over 40M ordinary users proved to be computationally unfeasible. As Figure 5 shows. The ﬁrst is via retweeting. in what respects they diﬀer from other ordinary users. having discovered the URL outside of Twitter. in fact. via a two-step ﬂow. but a user received the URL from another source. the population comprises two types—those who receive essentially all of their media-originating information via two-step ﬂows and those who receive virtually all of it directly from the media. however. bloggers are disproportionately responsible for retweeting URLs originated by all categories. shows only how many URLs are received by category i from category j. counted the number n of bit. a particular weak measure of attention for the simple reason that many tweets go unread. or making use of an informal convention such as “RT @user” or “via @user.ly URLs they had received that had originated from one of our 5K media users. For each member of this 600K subset we then counted the number n2 of these URLs that they received via non-media friends. where a user subsequently tweets a URL that has previously been introduced by another user. where of the 1M total. thus their overall impact is relatively minimal. we note that there are two ways information can pass through an intermediary in Twitter.1 retweets per person for ordinary users—the total number of URLs retweeted by bloggers (465k) is vastly outweighed by the number retweeted by ordinary users (46M).” The second mechanism is what we label reintroduction. The average fraction n2 /n = 0. and which to ignore. 5 4. so it is still interesting to ask: for the special case of information originating from media sources. what proportion is broadcast directly to the masses. which occurs when a users explicitly rebroadcasts a URL that he or she has received. To quantify the extent to which ordinary users get their information indirectly versus directly from the media. The essence of the two-step ﬂow is that information passes from the media to the masses not directly. In reality. This result reﬂects the characterization of bloggers as recyclers and ﬁlters of information. In general. is that even users who received up to 100 media URLs durAs before. but passes ﬁrst through an intermediate layer of “opinion leaders” who decide which information to rebroadcast to their followers. it should be noted. are drawn from other elite categories or from ordinary users. What is surprising. and for each user. therefore. retweeting is strongly homophilous among elite categories. on Twitter the ﬂow of information to the masses from the media accounts for only a fraction of the total volume of information.
not elites. political news vs. World 6 http://www.a b 10000 1 0 100 c d # of opinion leaders 4 16 64 256 2048 16384 131072 # of two−step recipients Figure 5: Percentage of information that is received via an intermediary as a function of total volume of media content to which a user is exposed.nytimes. by deﬁnition. news vs. advanced over 50 years ago. the theory predicted that opinion leadership was not a binary attribute. in fact. etc. one of which—“NY region”—was highly speciﬁc to the New York metropolitan area.000 URLs along a variety of dimensions. Figure 7 shows the proportion of URLs from each New York Times section retweeted or reintroduced by each category.” corresponding to our classiﬁcation of most intermediaries as ordinary. hence the number of intermediaries who rely on two-step ﬂows is much smaller than for random users.com/year/month/day/category/ title. however. used Amazon’s Mechanical Turk to classify a stratiﬁed sample of 1. etc. . communications technology in the interim—given. this method does not scale well to larger sample sizes. entertainment. we restrict attention to URLs originated by the New York Times which. where we note that the most prominent intermediaries are disproportionately drawn from the 4% elite users—Ashton Kucher (asplusk). however. Finally. Science. WHO LISTENS TO WHAT? The results in section 4 demonstrate the “elite” users account for a substantial portion of all of the attention on Twitter. 6370 could be successfully unshortened and assigned to one of 21 categories. but also show clear diﬀerences in how the attention is allocated to the diﬀerent elite categories. the vast majority of which (96%) are classiﬁed ordinary users. ing our observation period received all of them from opinion leaders. for example. Given the large number of URLs in our observation period (260M ). and on every social and economic level.html?ref=category . we ﬁnd that on average intermediaries have more followers than randomly sampled users (543 followers versus 34) and are also more active (180 tweets on average. thus we focused our attention on the remaining 8 topical categories.S. pass along media content to at least one other user. after CNN Breaking News. is roughly ten times as active as CNN Breaking News. sports news. also received at least some of their information via two-step ﬂows.. classifying even a small fraction of URLs according to content is an onerous task. The original theory also emphasized that opinion leaders.000 users. U. is the second-most-followed news organization on Twitter. Given the length of time that has elapsed since the theory of the two-step ﬂow was articulated. and how many of them are there? In total. It is therefore interesting to consider what kinds of content is being shared by these categories. Sports. the population of intermediaries is smaller than that of the users who rely on them. Figure 5c also shows that at least some intermediaries also receive the bulk of their media content indirectly. just like other ordinary users. . corresponding to our ﬁnding that intermediaries vary widely in the number of users for whom they act as ﬁlters and transmitters of media content. Who are these intermediaries. Finally. Figure 6: Frequency of intermediaries binned by # randomly sampled users to whom they transmit media content. NY Times. we note that intermediaries are not like other ordinary users in that they are exposed to considerably more media than randomly selected users. Arts. we exploit a convenient feature of their format—namely that all NY Times URLs are classiﬁed in a consistent way by the section in which they appear (e. versus 7). Of the 6398 New York Times bit. but that in general they were more exposed to the media than their followers—just as we ﬁnd here. Interestingly. only 9 categories had more than 100 URLs during the observation period. however.5M followers. To classify NY Times content. Instead. Figure 6 shows that although all intermediaries. like their followers. Interestingly.ly URLs we observed. with over 2. which emphasized that opinion leaders were “distributed in all occupational groups.g. but still surprisingly large. that a service like Twitter was likely unimaginable at the time—it is remarkable how well the theory agrees with our observations. a minority satisﬁes this function for multiple users. these results are all broadly consistent with the original conception of the two-step ﬂow. and the many diﬀerent ways one can classify content (video vs.) 6 . so it is arguable a better source of data. text. Bakshy et al.). but rather a continuously varying one. Comparing Figure 5a and 5c. World. roughly 500K. for example acts as an intermediary for over 100. and the transformational changes that have taken place in 5. however. In addition. Of these.
5. and (c) they do not appear in the last δ days. Using this estimation/evaluation split.S. and Sports.15 0.15 0. and only count URLs as having a lifespan of τ if (a) they do not appear in the ﬁrst δ days.20 0.35 0. URLs originated by media actors generate a large portion of short-lived URLs (especially URLs with lifespan=0. we consider only URLs that have a lifespan τ ≤ 70.05 0. News. because we require a beginning and ending buﬀer. we ﬁnd an upper-bound on lifespan for which we can determine the actual lifespan with 95% accuracy as a function of δ.10 0.25 0. Both of these results can be explained by the type of content that originates from diﬀerent sources: whereas news stories tend to be replaced by updates on a daily or more frequent basis. Business. while the media shows somewhat greater interest in U. . Finally. User Category Figure 7: Number of RTs and Reintroductions of New York Times stories by content category news is the most popular category.35 0.1 Lifespan of Content In addition to diﬀerent types of content. Naively. and second. Celebrities. and Technology are less popular still.30 0. To address the censoring problem. however. how likely are we see it in the last 90 days? Clearly this depends on the actual lifespan of the URL. as illustrated in Figure 8(a). by which we mean the time lag between the ﬁrst and last appearance of a given URL on Twitter. URLs introduced by diﬀerent types of elite users or ordinary users may exhibit diﬀerent lifespans. and disproportionately high interest in science. those that only appeared once). thus for the following analysis. organizations show disproportionately little interest in business and arts-related stories.05 0. but there are some minor deviations: in particular. We determined that τ = 70 and δ = 70 suﬃciently satisﬁed our constraints. In other words. we now explore the lifespan of URLs introduced by diﬀerent categories of users. Sports δ Total observation window = 223 days % RTs and Re-introductions 0. As Figure 9(b) shows. World News 0. Arts 6. technology.15 0. To determine δ we ﬁrst split the 223 day period into two segments—the ﬁrst 133 day estimation period and the last 90 day evaluation period (see Figure 8(b))—and then ask: if we (a) observe a URL ﬁrst appear in the ﬁrst 133−δ days and (b) do not see it in the δ days prior to the splitting point.S.35 0. news stories.20 2.20 0.30 0. by contrast.00 blog celeb media org other blog celeb media org other 8. Although this limitation does not create much of a problem for short-lived URLs—which account for the vast majority of our observations—it does Having established a method for estimating URL lifespan. URLs that appear towards the end of our observation period will be systematically classiﬁed as shorter-lived than URLs that appear towards the beginning.25 0. while correspondingly. measuring lifespan seems a trivial matter. URLs originated by bloggers are overrepresented among the longer-lived content. and so are more likely to be retweeted or reintroduced months or even years after their initial introduction. News ﬁrst observation of URL last observation of URL δ τ δ estimation period = 133 days evaluation period = 90 days τ 4. Science 0.25 0. however.30 0.00 7. Business 0. (b) they ﬁrst appear in the interval between the buﬀers. is in reality a lower bound on the lifespan. and because we can only classify a URL as having lifespan τ if it appears at least τ days before the end of our window.S.10 0.00 3.10 0. In general. the overall pattern is replicated for all categories of users.35 0. Health 0.25 0. followed by U. Arts. U. and possibly world news.10 0. URLs initiated by the elite categories exhibit a similar distribution over lifespan to those initiated by ordinary users.2 Lifespan By Category 5.00 5. a URL that is last observed towards the end of the observation period may be retweeted or reintroduced after the period ends.05 0.30 0. the sorts of URLs that are picked up by bloggers are of more persistent interest.1. as shown in Figure 9(a). as the longer a URL lives. when looking at the percentage of URLs of diﬀerent lifespans initiated by each category. a ﬁnite observation period—which results in censoring of our data—complicates this task.20 0. In particular.05 0. the more likely it will re-appear in the future.15 0. we need to pick τ and δ such that τ + 2δ ≤ 223. show greater interest in sports and less interest in health. What we observe as the lifespan of a URL. where increasingly niche categories like Health. we see two additional results: ﬁrst. therefore. Technology Figure 8: Schematic of lifespan estimation procedure create large biases for long lived URLs. Science. a URL that is ﬁrst observed toward the beginning of the observation window may in fact have been introduced before the window began. we seek to determine a buﬀer δ at both the beginning and the end of our 223day period.
and books. the result is somewhat the opposite—that is. longevity is determined not by diﬀusion. should be viewed as a subset of a much larger media ecosystem in which content exists and is repeatedly rediscovered by Twitter users. for URLs introduced by elite users. the source of its origin also impacts its persistence. For the vast majority of URLs on Twitter. and longformat magazine articles have lifespans that are eﬀectively unbounded.0 0. Some of this content—such as daily news stories— has a relatively short period of relevance. after which a given story is unlikely to be reintroduced or rebroadcast. First. classic music videos. As Figure 10 shows. at least on average—a result that is consistent with previous ﬁndings . consistent with our interpretation above.0 q q qq q qq qqqq q q q qqqqqqqq qq qqqqqqqqqqqqqqq qqq qqq qqq qqq qqq qqqqqq qqq q q q q qq q 0 10 20 30 40 50 60 70 70 lifespan (day) (b) Percent Figure 9: 9(a) Count and 9(b) percentage of URLs initiated by 5 categories. Note here that URLs with lifespan = 0 are those URLs . and can seemingly be rediscovered by Twitter users indeﬁnitely without losing relevance. the population of long-lived URLs is dominated by videos. which shows the average RT rate (the proportion of tweets containing the URL that are retweets of another tweet) of URLs with diﬀerent lifespans. but by many different users independently rediscovering the same content.8 0.20 log(# of URLs with lifespan = x day) q q q 15 q q q qq q q other celeb media org blog q qq qq q qq 10 qq qqqqqqq qqqq qqq qqqqq qqqqq qqq qqqqq qqqqqqq qqq qqqq qqq qqq 5 0 0 10 20 30 40 lifespan (day) 50 60 70 Figure 10: Top 20 domains for URLs that lived more than 200 days RT rate (# of RTs / total # of occurrences) (a) Count 1. where this result is especially pronounced for long-lived URLs. for ordinary users. and suggests that in spite of the dominant result above that content lifespan is determined to a large extent by type. captured by the ﬁrst part of Laswell’s maxim—“who says what to whom”—in the context that only appeared once in our dataset. we used the bit. and mapped them into 21034 web domains.ly API service to unshorten 35K of the most long-lived URLs (URLs that lived at least 200 days). even for URLs that persist for weeks. Although it is unsurprising that elite users generate more retweets than ordinary users. music. they are more likely to be retweeted than reintroduced. CONCLUSIONS In this paper.2 0.4 q q 7 % of URLs from elites category 6 5 4 3 2 1 0 0 10 20 30 40 lifespan (day) 50 60 celeb media org blog other celeb media org blog 0.6 0. grouped by the categories that introduced the URL7 . the size of the diﬀerence is nevertheless striking. thus the RT rate is zero. Twitter is. but rather from reintroduction. we investigated a classic problem in media communications research. however. Second. Two related points are illustrated by Figure 11. the majority 7 6. To shed more light on the nature of long-lived content on Twitter. movie clips. At the other extreme. in other words. with diﬀerent lifespans Figure 11: Average RT rate by lifespan for each of the originating categories of appearances of URLs after the initial introduction derives not from retweeting. in other words.
Menzel. bloggers. moreover. J. Moreover. Gummad. Includes bibliographical references. Van Der Heide. D. Kim. Katz. to opinions or actions that would engage more directly with the “eﬀects” component of Lasswell’s maxim.. In particular. F. S. . communications on Twitter may be unrepresentative of information ﬂow via more traditional channels. Business. ACM. and interpersonal interactions on the other hand. Third. media following media. Jiang. The Inﬂuentials: People Who Inﬂuence People. economies. Second. C. Identifying ‘inﬂuencers’ on twitter. 2010. Theory and Society. New York. Watts. M. J.05% of the population accounts for almost half of all attention. for example. A. Choi. REFERENCES  E. Walther. 2003070095 James Surowiecki. Lim. In particular. Benevenuto. E. 2011. 1st edition. because Twitter users themselves classify other users by including them on lists. attention remains highly concentrated. B.  T. Weimann. organizations. and D. editor.  H. He. Free Press. Albany. and that for this reason we have focused on a limited set of predetermined user-categories. Bakshy. Twitter eﬀectively provides a ready-made. Lazarsfeld. crowdsourced classiﬁcation scheme of users. Science. The Communication of Ideas. P. First. H. By restricting our attention to Twitter. J. Kossinets and D. and media inﬂuence sources online. pages 17–38. U. By studying the ﬂow of information among the ﬁve categories that we identiﬁed (media. Lazarsfeld.  M. A new era of minimal eﬀects? the changing foundations of political communication. S. In R. pages 261–270. and because Twitter maintains a complete record of every tweet broadcast.  J. 20(4):253–270. 2004. Gitlin.  H. celebrities. First. Winter.  J. Carr. L. a signiﬁcant challenge for future work is to merge the data regarding information ﬂow on Twitter with other sources of outcome data—relating. In 4th Int’l AAAI Conference on Weblogs and Social Media. 1994. and Q. and Culture on Social Network Sites. and S. Sociometry. 1978. And ﬁnally. C. 3rd edition. pages 441–474. Mason. However. In L. we ﬁnd that although audience attention has indeed fragmented among a wider pool of content producers than classical models of mass media. Columbia University Press. Moon. IL. and bloggers following bloggers. K. T. W. Urbana. E. shedding more light on the “what” element of Lasswell’s maxim. and H. because Twitter users explicitly opt-in to “follow” each other. Second. we note that although our use of Twitter lists to label users was motivated by a speciﬁc set of questions regarding mass vs interpersonal communications.  J. The people’s choice. Lasswell. ACM. Haddadi. A Networked Self: Identity. which are continually being rediscovered by Twitter users and appear to persist indeﬁnitely. What is twitter.  G. Watts. State University of New York Press. editor. Surowiecki. DC. how the voter makes up his mind in a presidential campaign. Glencoe. Measuring user inﬂuence on twitter: The million follower fallacy. 1968. Kwak. 58(4):707–731. 6(2):205–253. for example. In closing. H. the part played by people in the ﬂow of mass communications. F. T. attention is highly homophilous. F. a social network or a news media? In Proceedings of the 19th international conference on World Wide Web. Interaction of interpersonal. are more connected and more exposed to the media than their followers. etc). In addition. pages 591–600. 2006.  J. peer. media-originated URLs are disproportionately represented among short-lived URLs while those originated by bloggers tend to be overrepresented among long-lived URLs. 2010. 1957. The structure and function of communication in society. editor. as has been proposed elsewhere . J. Bennett and S. and K. NY. DeAndrea. 1968. New York. such as TV and radio on the one hand. pages 117–130. with celebrities following celebrities. 2010. Ill. and H. Berelson. S. another area for future work would be to extract content information in a more systematic manner. In Proceedings of the third ACM international conference on Web search and data mining. such an approach would allow one to examine the category of opinion leaders in more detail.  R. B. C. Twitterrank: ﬁnding topic-sensitive inﬂuential twitterers. Merton. New York. Lee. Routledge. J. Iyengar. D.S.of Twitter. S. 1955. Coleman. ”  G. Bryson. and ordinary). Social theory and social structure. where roughly 0. Patterns of inﬂuence: Local and cosmopolitan inﬂuentials. The diﬀusion of an innovation among physicians. Tong. Journal of Communication. there are some diﬀerences— organizations. we ﬁnd that the longest-lived URLs are dominated by content such as videos and music. Katz and P. Hofman. societies. Cha. who although classiﬁed as ordinary users. Papacharissi. Personal inﬂuence. Within the population of elite users. 7. Community.  P. Doubleday. In Z. and nations.  E. 1948. possibly identifying opinion leaders for diﬀerent topics. K. devote a surprisingly small fraction of their attention to business-related news. it would also be interesting to explore automatic classiﬁcation schemes from which additional user categories could emerge. Merton. Media sociology: The dominant paradigm. Weng. In Fourth ACM International Conference on Web Seach and Data Mining (WSDM). 2008. Washington. we ﬁnd that although all categories devote a roughly similar fraction of their attention to diﬀerent categories of news (World. and B. 2010. ACM. it provides an unprecedented level of resolution and coverage regarding who is listening to whom. our conclusions are necessarily limited to one narrow cross-section of the media landscape. Finally. University of Illinois Press.  W. Park. Gaudet. Hong Kong. 311(5757):88–90. our analysis sheds new light on some old questions of communications research. we ﬁnd considerable support for the two-step ﬂow of information—almost half the information that originates from the media passes to the masses indirectly via a diﬀuse intermediate layer of opinion leaders. P. we feel the advantages of using Twitter to answer this question outweighed the limitations. The Wisdom of Crowds : Why the many are smarter than the few and how collective wisdom shapes business. Free Press. We also ﬁnd that diﬀerent types of content exhibit very diﬀerent lifespans. Empirical analysis of an evolving social network.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.