You are on page 1of 14

The Web Centipede: Understanding How Web

Communities Influence Each Other Through the Lens of
Mainstream and Alternative News Sources

Savvas Zannettou? , Tristan Caulfield‡ , Emiliano De Cristofaro‡ , Nicolas Kourtellis† ,
Ilias Leontiadis† , Michael Sirivianos? , Gianluca Stringhini‡ , and Jeremy Blackburn†
Cyprus University of Technology, ‡ University College London, † Telefonica Research
?

sa.zannettou@edu.cut.ac.cy {t.caulfield,e.decristofaro,g.stringhini}@ucl.ac.uk
{nicolas.kourtellis,ilias.leontiadis,jeremy.blackburn}@telefonica.com michael.sirivianos@cut.ac.cy
arXiv:1705.06947v1 [cs.SI] 19 May 2017

ABSTRACT fowars and “fringe” Web communities like 4chan.
As the number and diversity of news sources on the Web Overall, the Web and online social networks have greatly
grows, so does the opportunity for alternative sources of reduced the barrier of entry for such alternative news
information production. The emergence of mainstream so- sources. Due to the negligible cost of distributing informa-
cial networks like Twitter and Facebook makes it easier for tion over social media, fringe sites can quickly gain traction
misleading, false, and agenda driven information to quickly with large audiences. At the same time, the explosion of in-
and seamlessly spread online, deceiving people or influenc- formation sources also hinders the effective regulation of the
ing their opinions. Moreover, the increased engagement of sector, while further muddying the water when it comes to
tightly knit communities, such as Reddit and 4chan, com- the evaluation of news information by readers.
pounds the problem as their users initiate and propagate al- While there are many plausible motives for the rise in al-
ternative information not only within their own communi- ternative narratives [23], ranging from libelous (e.g., to harm
ties, but also to other communities and social media plat- the image of a particular person or group), political (e.g,. to
forms across the Web. These platforms thus constitute an im- influence voters), profit (e.g., to make money from advertis-
portant piece of the modern information ecosystem which, ing), or just trolling [1], the manner in which they prolifer-
alas, has not been studied as a whole. ate throughout the Web is still unknown. Although previ-
In this paper, we begin to fill this gap by studying main- ous work has examined information cascades, rumors, and
stream and alternative news shared on Twitter, Reddit, and hoaxes [10, 13, 21], to the best of our knowledge, very lit-
4chan. By analyzing millions of posts around a variety of tle work provides a holistic view of the modern information
axes, we measure how mainstream and alternative news flow ecosystem. This knowledge is crucial to understand the risks
between these platforms. Our results indicate that alt-right associated with alternative news and to design appropriate
communities within 4chan and Reddit can have a surpris- detection and mitigation strategies.
ing level of influence on Twitter, providing evidence that Anecdotal evidence and press coverage suggest that alter-
“fringe” communities may often be succeeding in spread- native news dissemination might start on “fringe” websites,
ing these alternative news sources to mainstream social net- eventually reaching mainstream online social networks and
works and the greater Web. news outlets.1 Nevertheless, to the best of our knowledge,
this phenomenon has not been rigorously studied and no
thorough analysis has focused on how news moves from one
1. INTRODUCTION online service to another like an interconnected centipede. In
After the Boston Marathon bombings in 2013, a large this paper, we address this gap by providing the first large-
number of tweets started to claim that the bombings were scale measurement of how mainstream and alternative news
a “false flag” perpetrated by the United States govern- flows through multiple social media platforms. More specifi-
ment [24]. The #GamerGate controversy started as a blog- cally, we focus on the relationship between three fundamen-
post by a jaded ex-boyfriend that was twisted and turned tally different social media platforms, Reddit, Twitter, and
into a pseudo-political campaign of targeted online harass- 4chan, chosen because of their fundamental differences as
ment [6]. More recently, the PizzaGate conspiracy, a de- well as their generally accepted “driving” of substantial por-
bunked theory connecting some restaurants and members of tions of the online world.
the US Democratic Party with a child-sex ring, even led to Contributions. This paper makes several contributions.
a shooting in a North Carolina restaurant [27]. What these First, we undertake a large-scale measurement and compar-
examples have in common is that they were propagated in
no small part via the use of “alternative” news sites like In- 1 http://bbc.in/2pQP5KH

1

com and information with a maximum error of 7 hops for general net- compare temporal. some end up cited widely on the Web. Nguyen et Shao et al. The next section reviews related work. Section 4 presents a general can significantly contribute to stopping the propagation of characterization of each platform. covering three and a half years of gorithm. Reddit.ison of the occurrence of mainstream and alternative news munity skepticism. and extract 104 viral events. Friggeri et al. Andrews et al.e. also based on rumor centrality. [2] Paper Organization. [14] study the char. exper- annotated by human coders. that hoaxes can propagate easily if there is collaboration be- tively heavy use of time capsule services. as a maximum likelihood estimation problem. a website by the monitoring nodes in order to identify the source. our findings indicate that Twitter. Finally. in Sec. Also. termine the effect of “official” accounts with respect to the tion 3. Twitter.e. Kwon et al. [8] an- and Twitter). scientific stories. ric called rumor centrality. [3] also present by users of Reddit and 4chan but not those of Twitter.7B tweets. analyz- utilizing a statistical model for influence.com were more likely to be deleted. and two hours). Hawkes processes. [13] study the presence of hoaxes in Wikipedia articles. Budak et al. Del Vicario et al. In this section. and (iii) content production (i. using a reverse diffusion pro- tion on Twitter. the days following the 2010 earthquake in Chile.4M tweets. al. They also study a sample of 1. They show that using two network snap- link to snopes. finding it to supports multiple snapshots of the network during the false be quite bursty. and that posts containing a comment with a information spread.e. [18] look for the k users that are most suspected to have formation about the dynamics of false information propaga. Section 6 reports our measurements the incidents. By a case study based on a hostage crisis in Sydney. Finally.. mulate the problem of finding the source of false information tecting false information sources. we review prior work on disinformation Detecting false information sources. They allowing users to study propagation of false information on indicate that with sufficient number of monitoring sources Twitter. Kumar et shots instead of one can significantly improve detection. Shah et al. Arif et al. we provide an understanding of the tem. Seo et al. i. They evaluate it for all nodes in acteristics of rumor propagation on Twitter. using a met- Disinformation dynamics. 2 . as well as re-tweets networks.. whether the content stream and alternative news sources. Authors show. mogeneous communities that have similar content consump- ment of the influence between the platforms that provides tion patterns but exhibit different cascade dynamics.is. He also argues both alternative and mainstream news. The rest of the paper is organized as study two crisis-related incidents on Twitter aiming to de- follows. then.4M tweets from three main perspectives: (i) volume we show that each of the platforms (and in the case of Red. semination of false rumors vs. and the node with the highest rumor centrality is Twitter data. temporal characteristics of propagation. com. [17] study the dis- of the influence between the platforms. Reddit. and this influence differs with respect to main. We also find rela. [10] also use stories problem from a statistical point of view. we present a measure. and structural characteristics works and 4 for tree networks. They study debunked stories imentally. such as archive. propose an algorithm that observes the information received formation. sources across three social media platforms (4chan. which the evolution of false information on Facebook. sub-communities) have varying degrees of influence on (ii) exposure (i. Wang et al. cess along with a ranking process. and that it would have spread more if a conven- 4chan are used quite extensively for the dissemination of tional media outlet did not publicly deny it. linguistics. [21] introduce Hoaxy. number of individuals exposed to the ru- each other. Finn et al. while Section 5 discusses the rumor by actively engaging in conversations related to our temporal findings. tween the recipients of the hoax. al. originated false information. (i. [20] for- propagation dynamics in social networks as well as on de.e. They with the most tweets share a much higher ratio of false in. number of rumor-related messages per time interval). RELATED WORK tively distinguish the former from the latter. mor). insight into how information spreads throughout the greater Situngkir [22] empirically studies an Indonesian hoax on Web. ing 5. alyze how Facebook users perceive and react to conspiracy poral dynamics of how URLs from news sites are posted on theories vs. Mendoza et al. that the model can distinguish the source of false from false information busting sites such as snopes. Next. finding that it spread broadly and quickly (within Overall. we discuss the social networks and the information containment of rumors. confirmed news on Twitter cludes in Section 7. while the paper con. itation to counteract the effect of misinformation. concluding that an aggregate analysis on the flow of tweets can effec- 2. proposing a source that Snopes determined as false to study the propagation and detection framework. to visualize indications of bursty activity. [26] study the to legitimate information. which are then deemed to be the most likely source. a platform providing in.. [19] aim to finding that the diffusion of fact-checking content lags that identify the source of rumors in online social networks by of false information by 10-20 hours and that the top 1% users injecting monitoring nodes across the social graph. is written by the user or is re-shared). [5] study the notion of competing campaigns They report that while most are detected quickly and have in a social network and address the problem of influence lim- little impact. [9] present TwitterTrails. finding two polarized and ho- the different social networks. dit. They analyze the network using a simple linear time message-passing al- a corpus of 1. they can recognize the source with high accuracy.. Authors show that official account sources studied in this paper.

4chan. 2017. all threads are perma.com and rt. 3. with or with.228. i. we provide some background information nently deleted after 7 days. Another key characteristic of 4chan is the post that contains links to the URLs from the aforemen- ephemerality: there is only a finite number of threads that tioned news domains.4 4chan. as long as the discussion falls within the general topic of the board. choosing the topic as well as the non-English sites. we select 45 among the Alexa top 100 news sites. This has led to a plethora of communi. There is also a threaded confidently be labeled as either “mainstream” or “alterna- comments section for users to discuss a post. or /pol/. % Main. which makes While we use 4chan’s sports (/sp/). i. URLs to our list of alternative and mainstream sites. More specifically. users of can be active at a given time on a board. we are primarily in- theme.022% 0. analyzing the dynamics and information flow Table 2: Overview of our datasets with the number of posts/comments that of Reddit. Rather. In other Reddit (6 selected subreddits) 620. and even meta-communities focusing on interactions have recently attracted public attention due to their posting people have in other subreddits. ranging from video games to news and politics. DATASET In this section. Some of its features include the dia platforms. rebroadcasting a tweet. and June 30. and has news aggregator. 2016 and February 28. cover activity on the three platforms we measure between quired to provide a username to access or post to 4chan.wikipedia. Remarks. Table 1 shows the to- the default “Anonymous” is the preferred and overwhelm. FakeNewsWatch. communities on leaving out those based on user-generated content.g.963 40. Twitter 587 Million 0. URLs Main.537 8. the selection extremely lax moderation: although boards are divided into of news sources. on Twitter. we study major OSN and various news sites that ac. Although users can sites including 45 mainstream and 54 alternative ones.google. where users post URLs to content along been often linked to the alt-right [4] as well as exhibiting a with a title. high degree of racist and hate speech content [12]. and several news sites. /pol/ fo- Reddit. When a new thread 2 https://en. and other users can upvote or downvote the post..050% 0. Users are not re. and comments several boards (69 as of May 2017) for different topics of in. as well as retweeting.840 words. /sci/. Other users can add posts to the thread. and quote or reply to posts. tweets pertaining to shooting events and conspiracy theories.700 42. in one of We gather information from posts.513 specifically. 4chan 42 Million 0. of controversial. it inher- network where users can easily broadcast 140-character ently lacks many of the “social” features of other social me- ‘tweets’ to their followers.023% 0. 4chan is known for its on the three social media platforms we study.046 301. When compared to Twitter. /sp/) 7. For the latter.480 a comprehensive (i.org/wiki/List of fake news websites is created. we create a list of 99 news are also subject to the voting system. Although several boards have a 4 Thecomplete list of the 99 sites is available at https://drive. as well as the number of unique URLs linking to alternative and mainstream news sites in our list. contain a URL to one of our information sources. and 4chan that contain URLs from the 99 terest. With a few gaps (described below). finance news). as well as create their own subreddits. the order in News sites.com/ temporary archive for purged posts. The so-called “front page of the Internet” is a social cuses on the discussion of politics and world events.2 Datasets a single image attached.181% using graph analysis on the domains linked from the tweets.com/ the “bump” system [12]. multi-service) point of view. directed social 4chan’s primary mode of operation is “anonymous”. pornog. a keyword preceded by #). open?id=0ByP5a khV0dM1ZSY3YxQWF2N2c. safe and not safe for work categories. the community structure is not the former.e.530 40. we use Wikipedia2 and moderation policy. 4chan (/int/. and comments tive” news. URLs sights on disinformation dynamics on social networks from Twitter 486. news sites. it easier for users to find and weigh in on tweets around a and science (/sci/) boards as a baseline. volunteer “janitors” and paid employees generally are not concerned with the 3.com.550 236. this paper provides in.948 4chan (/pol/) 90. For mark each other as friends. 4chan is a type of discussion forum known as an im- ageboard: users create a new thread by making a post with 3. Finally. Our analysis uses a set of news sites that can which they are displayed on the site.. native news domains: sputniknews.3 We also add two state-sponsored alter- ties.105 24. threads. as they raphy. Platform Posts/Comments Alt. Users can serving specialized content (e. Twitter.. those Reddit are formed via the “subreddit” concept. defined by the friendship relation.131 615 5. Reddit (all other subreddits) 1. terested in the Politically Incorrect board.e. Starbird [23] performs a qualitative analysis on Platform Total Posts % Alt. hashtag (basically. Votes determine the ranking of the posts. and seemingly agenda-pushing stories [7]. Twitter is a micro-blogging.197% and provides insight on how various websites work to pro- Table 1: Total number of posts crawled and percentage of posts that contain mote conspiracy theories and push political agendas. our datasets out an image.1 Platforms and News Sources language used or the tone of the discussions. 3 .027 726.070% Reddit (posts + comments) 332 Million 0. and perhaps some text. Since Twitter. Reddit. an old one is purged based on their ranking within 3 http://fakenewswatch. Finally.164 tively contribute to information diffusion across the Web.e. and there is no concept of friends/followers.. In contrast to prior work. tal number of posts/comments crawled and the percentage of ingly used identity. and details on the collected data. international (/int/).

We also shed light on time capsule services stances. Many of the March and May 2017. we single out six 4chan.21 % worldnews 6.87 % AskReddit 0. 4chan and Reddit are sharing frequently links to mainstream news whereas 4chan users are more likely to post links to 4.23 % wsj.) (%) Subreddit (Main.31 % redflagnews. which we present in more detail below. worldnews. we gather 487k tweets containing 279k and politics.6 % ReddLineNews 0. mainstream and alternative URLs occur.com 2.45 % news etc 1. 2016.com 2.63 % usatoday. domains.36 % washinghtontimes.com 1.31 % rss theonion 0.com. and Feb 24–28.54 % Table 3: Top 20 subreddits w.84 % conspiracy 0. specifically between porters. We have some small gaps due to our crawler fail. 2017. we find 76k URLs (40k unique) from alter- mainstream URLs. Although the latter is 390M comments. politics. Table 4 reports the top 20 mainstream/alternative ing.tv 0.15 % PoliticsAll 1.94 % NoFilterNews 1.26 % the Europe 0.07 % nypost. along with their percentage.54 % foxnews. We start by identifying communities around news ing API.5 % POLITIC 0.com 1.com 2. we posts and comments that contain URLs from one of the 99 can anecdotally confirm that commenters often try to push news sites. we use all threads and posts made on the subreddits for further exploration: The Donald.twitter. GENERAL CHARACTERIZATION alternative news sites. Reddit.67 % Uncensored 2. The resulting dataset in.85 % The Donald 4. /int/ AskReddit. as well as /sp/ (sports).51 % nottheonion 0.76 % worldnews 1.com 8. subreddit.5 In total.. while for alternative do- 6 https://www.45 % bbc.9 % breitbart. and 300k subreddits.3 % bloomberg.com 1. top 20 domains for mainstream news account for 89% of 5 https://dev. three platforms. as further discussed in Section 4.com 0.com 2.94 % activistpost. conspiracy.34 % huffingtonpost. Known alt-right news outlets. 2016 vant events. and how they contribute to the dissemination of news.66 % TheColorIsBlue 3. we filter intended for open-ended questions that spark discussion.com 1. Due to a failure in our collection infrastructure.67 % Health 2.89 % Conservative 1. 2016 and February native news and 600k (301k unique) from mainstream news 28. Politically Incorrect (/pol/) board.com 5. are predominantly 4 .com 2. we present a general characterization of the datasets. we study the occurrence of each news out- cludes 97k posts and replies. publicly available reddit comment/ such as breitbart. between of June 30. based and approximately 1. In the end. In Table 3.com 4. this data.com 2.go.20 % prntly..27 % naturalnews.23 % sputniknews.14 % forbes.com 1. We collect the 1% of all publicly available tweets with URLs from the aforementioned news domains between 4.28 % thehill. answer questions submitted by users.37 % veteranstoday.06 % beforeitsnews.81 % todayilearned 0. which yields a dataset of 1.89 % therealstrategy.com 2.com 14.05 % cncb. in the 6 selected subreddits. The Jan 10–13. we re-crawl each tweet to retrieve subreddits are indeed related to news and politics – e.com 0.21 % canada 1.67 % thenewsrightnow 0. 2017.g.tv 0.67 % europe 0.com and infowars.reddit.com/r/datasets/comments/3bxlg7/i have every mains the percentage is 99%.07 % worldnewsdailyreport.com 55. We obtain all posts and comments on Reddit be. on their propensity to include news URLs. Therefore. while ‘worldnews’ is focused around globally rele- Oct 28–Nov 2 and Nov 5–16.6 We collect approximately 42M posts. as well as Nov 22.52 % AskTrumpSupporters 0.com 19.57 % TheOnion 0.com 0. 2016 as well as news sites and their percentage in the six subreddits.04 % EnoughTrumpSpam 1. which has been involved in disinformation cam- Reddit. us. and news.84 % news 4. Subreddit (Alt.16 % dccclothesline.com 0.g.99 % theguardian.2 % cbsnews. 2017.1 Platform Analysis June 2016 and February 2017 using the Twitter Stream.37 % politics 12. Oct 15–16 and Dec 16–25.t.com 2. Table 4: Top 20 mainstream and alternative domains and their percentage rence and their percentage in Reddit (all subreddits).com 0.r.org 0.05 % new right 0.com 0. using data made avail.) (%) The Donald 35.24 % rt.95 % reuters.com 6. Twitter.87 % KotakuInAction 1. (international).94 % BreakingNews24hr 1.2 % thedailybeast. and /sci/ (science) boards for comparison.41 % AskReddit 1.73 % nodisinfo. Table 2 provides a summary of our In this section.78 % clickhole. we do not get information such as the number omit automated ones (e. we ‘The Donald’ is mostly a community of Donald Trump sup- have some gaps in the Twitter dataset.com 8. paigns including Pizzagate as well as ‘AskReddit. both mainstream and alternative news sources are used to able on Reddit itself. between articles are posted without user intervention.59 % hillaryclinton 0. we report the top 20 subreddits with unique URLs. Note that mainstream and alternative news URLs found on each of the we break Reddit and 4chan datasets into two different in.com/streaming/overview all mainstream URLs in our data. including 56k alternative and let.com 0.com 2.73 % thelastlineofdefence.53 % infowars. We also find the presence of the ‘conspiracy’ – Jan 13.23 % news 3.58 % nytimes.4 % time.18 % cnn.83 % disclose.com 2. Since tweets are retrieved at the time they the most URLs.’ where tween June 2016 and February 2017.1M URLs.75 % worldtruth.com 5. 2017. In order to get a better view of the popularity of news ing the same methodology as [12].86 % conspiracy 3.86 % HillaryForPrison 0. /r/AutoNewspaper/) where news of times they are re-tweeted or liked.54 % willis7737 news 2. For 4chan. Specifically.com 0. Once again.94 % WhiteRights 1.) (%) Domain (Alt.10 % TheColorIsRed 2.8M posts/comments their agenda even on non-political threads.com 11. sites on Reddit.85 % AnythingGoesNews 0.11 % abcnews. Note that we are posted.com 2.com 4.48 % lifezette.49 % nbcnews.07 % politics 8.com 1. specifically.com 3.) (%) Domain (Main.77 % libertywritersnews.

34 % the-daily.36 % bbc.com 1.com 2.tv 0.co s. and a small percentage of them are no 0.04 % theguardian. all mainstream and alterna.63 % forbes. and sputniknews.0 react365. rt. stream news. .2 for tweets with URLs from alternative news.05 % forbes.15 % cbsnews.go.12 % cnn.16 % nypost. we find 21k (9k unique) URLs also obtained similar statistics for the domain popularity in to alternative news outlets and 82k (40k unique) to main. infowars.com.13 % prntly.co .com.com.com 2.co .39 % thedailybeast.44 % clickhole.04 % infowars. and (b) mainstream news. For the mainstream spectively.66 % dccclothesline.co when a particular false story is debunked [10].95 % redflagnews.25 % nypost.05 % cnbc.com 53.2%) 341 ± 1.com.91 % reuters.2 sputniknews.com 1.co .) (%) Domain (Main.com 0.com.com 0.146 0.75 % beforeitsnews.com 0. one.com 19. at about the same rate (on average.co . as plotted in Fig.56 % nytimes.com 4.16 % cbc.228 0.95 % Table 7: Top 20 mainstream and alternative news sites in the 4chan (/pol/) libertywritersnews.26 % cbc. /pol/).04 % therealstrategy.com is the most present popular alt-right as well as state-sponsored news outlets.com 0.04 % time. A similar pattern is ob.8 in Table 5. as well as state-sponsored alternative domains like 0.co . beforeitsnews.e. These cover 4.04 % worldnewsdailyreport.6 rt.4 present.c y.com 1.co . subreddits.c s.com 5. Tweets Retrieved (%) Avg.07 % foxnews.77 % sputniknews. Next.co s.com 0.com 1.02 % firebrandleft.com 2.co .co uth s.com.) (%) naturalnews. respectively). in our Twitter dataset.950 (87.ca 4.tv m . wo (a) Twitter. (b) served for likes.89 % worldnewsdailyreport.02 % washingtonexaminer. In our 4chan dataset.co .104 (83. re.96 ± 55. t.99 % activistpost.10 % nbcnews. 404 and 341 wa retweets per tweet. respectively.41 % thehill.com 1.c s.co e. Recall that we re-crawl tweets to get the number of retweets and likes.com 2. we observe that theguardian.33 % 0.com 9. the 6 fluential alternative news domains are breitbart. mediamass.com 2.c y. We find that the 5 .co .com 1. The fact that many such URLs appear in our bre infutni alstoreit dflaaturacloth lif ctivei rans cli ily w ter ed ofd sphere bef re n dc a et sda wri m ine dataset may indeed be an indication that Reddit significantly t v n ew berty lastl rl d li the contributes to the dissemination of controversial stories. 0.com 2.32 % activistpost.com 3.22 % nytimes.38 % wsj.7%) 404 ± 2.071 329. Likes Domain (Alt.com 1.06 % abcnews.24 % dccclothesline.com 4.53 % reuters.4 account was suspended.o i t bar or wakr newrategsnewgnewlneweslinezettstpostodackholprntldiscrl epoorrldtsr newiamaefen paganda [7]. and bbc.com 0.20 % bloomberg.com 1.) (%) breitbart.go. gories.c y.629 92.com 10.co . we report the top 20 mainstream and al.com 2. in Table 6.com.com and rt. we find 129k (42k unique) 4chan Reddit Twitter URLs of alternative news domains and 413k (236k unique) 1.com 0.) (%) Domain (Main.65 % naturalnews.com 0.07 % prntly.com 2.com 1.35 % URLs in the tweets in our dataset.com 3.tv 0.co .2 Popular Domains 87% and 99% of.com 1.c e.com 0.com 0.41 % usatoday.co .07 % Mainstream 376.co .29 % washinghtontimes.com 0. Also. Basic statistics are summarized 0.tv m et rg the spotlight for disseminating false information and pro.29 % bloomberg. cnn.com 2.ca 2. by far.com 0.com 5.com 0.06 % usatoday.40 % breitbart.79 % libertywritersnews. the most in.buzz 0. we note the presence of many news.68 % worldtruth.com 2.com 4.02 % washingtontimes.25 % disclose. Table 7 reports the percentage of URLs of the top 20 domains for each type of news.82 % redflagnews.c ose t.com 28.co .71 % 4chan Reddit Twitter newsbiscuit.nse.com 0.6 Fraction and their respective percentage.45 % Table 5: Basic statistics of the occurrence of alternative and mainstream veteranstoday.com 9.tv 0.com 0.co .com 1.com 46.0 m m m m m m m m m m ca m m m m m m m m m . We 4chan.0 URLs of mainstream ones.com.40 % 1.com 5.6 Fraction longer available as they were either deleted or the associated 0. followed by nytimes.co e.78 % dataset and their respective percentage.com 10.com 17. In our Twitter dataset. in both cate- tive news URLs.com 3.04 % foxnews. Again.11 % cbsnews. an es nn bc rs ill ws es sj rg cb gooday ewstime ewsimeseastpostcnbc ra diytim c breutetheohxne forb wmbe t cn sn t yb ny native and mainstream news tend to get a significant number gu n f blo o usa nb cbngtoendail the shi th of retweets.c t. These cover 86% and 99% of all URLs.com 6.6 infowars.48 % clickhole.co s.net 0. appear on the three platforms (i. we compare how popular domains.0 m om om om m m m m om om om m om om .00 % theguardian.co c.com 0.61 % abcnews.com 8.com 3.11 % thehill.03 % cnbc.96 % nodisinfo.com 0.. Retweets Avg.com 3. Twitter. This percentage is slightly higher 0.com 2. the other boards of 4chan but we omit them for brevity.8 Table 6: Top 20 mainstream and alternative news sites in the Twitter dataset 0.com 14.46 % huffingtonpost. 1.com 5.com 2.com 0. alter.com 0.82 ± 15.10 % lifezette. ternative news domains. Figure 1: Top 20 domains and each platform’s fraction for (a) alternative Then.29 % rt.com 2.com 4. which have recently been in 0.com 1.25 % bbc.10 % Alternative 110.02 % now8news.co t.42 % Domain (Alt.37 % wsj. along with their percentage.com 17.26 % nbcnews.90 % sputniknews.co .85 % disclose. possibly due to the fact that some users tend to remove controversial content 0.82 % therealstrategy.com 0.86 % time. Similar to Reddit. we observe that.co .com 3.com 0.co ss.

since on 4chan Reddit.com m nytimhan. per user. the top 20 domains archived using archive.0 1.com – influence the three platforms more have a fraction below 0. iy.co buz eaks. infowars.com onp t.3 Time Capsules lifezette. Besides preserving 4chan.com te. Moreover.com twit . as plotted in Fig. . 2.co dit.c saa pas ost. it follows that it would not be popular on Reddit and 4chan and Reddit.com onp om poli com wsj. Reddit and 4chan respectively. we find that that Twitter ter. However.c t om m zfee m dod m poli c. of posts).or .001% of tweets contain such URLs) when compared which shows the ratio for users sharing URLs from both cat.com and veteranstoday.co t.com den rg wsj.c dail ddit.co face ter.4 0.c es.is.co d.078% egories.o .com is popular only on Twitter. and bots are acceptable on Reddit (as long as they content.c ingt dit.co il. sputniknews. (b) Reddit. oo.com huff nytim r. 10 1 10 1 10 1 Portion of total Portion of total Portion of total 10 2 10 2 10 2 cnn om g new m red . therealstrategy.co n. Next.2.0 0. 27.com.org bb org hin itte om huff gtonpo r.is URLs found on our 4chan. rt. pear predominantly in some platforms but not in others – e. but not on Twitter.o face 8ch.co il.4 0. . fraction only for Reddit and Twitter users. especially on Red.7k unique) on 4chan. some outlets ap.c .c es. We find that the fraction of al- tion close to 0) and those who share them almost all the time ternative vs.jp see o.com yma om . We report this We retrieve all the archive. We also observe from Fig. top 4 alternative domains – breitbart.is is not so popular among the twitter community URLs to alternative news.8 1. and Twitter datasets: we find 5.jp the ameb net rdia lo.6 0.c dail dian.c ku.com onp om tog st.co buz art. while. bitually snapshot through the service. which might be also attributed to or less in the same way.1k (26k unique) on Reddit. 6% and 9% in Twit- (fraction close to 1).0 0.is URLs on (a) 4chan.com salo . and (c) Twitter. 3.com mon r.3k unique) posts are anonymous.8 0. ternative. In Fig. jiro.com are popular on Reddit and 4chan.2 0.com.g.3k (10. (just 0. mainstream news is 5%. ost. we crawl these URLs to extract the time dit. the most popular time capsule service used on of bots. and (b) users that shared URLs from both mainstream and alternative news.co bre am.co n.org iona 4chan m ipe com nyd zfeed rg the thehill.2 0. These numbers indicate that 13% of Twitter users – that likely are bots [25] – only post archive. Thus. m lrev .6 0.uk yah .0 Alternative News Fraction Alternative News Fraction (a) (b) Figure 2: CDF of the fraction of URLs from alternative news and overall news URLs for (a) all users in our Twitter and Reddit datasets. if the reason that a particular do. and we cannot say with any certainty that bots do not exist on without redirecting to the original site.0 0.is.2 0. that there is a wide distribution. uk com boo m just . aiming to analyze what content users ha- 4chan.com net nytimc2. zero mail. We focus main is popular on Twitter is primarily due to the influence on archive. 92.jp m san ne. for each platform. behavior of such URLs.4 0.6 0.co gua n.com com rdia m m yma om .c s. Reddit Twitter Reddit Twitter 1. We find that 80% of the users of both URLs on Twitter. and the original page URL.2k (2. they are certainly more URL as well as to prevent additional traffic (and possibly prevalent on Twitter. om com om hed .8 1.0 0. 2(b).2 0. time capsules can be used to obfuscate the original stay within the terms of service). while 4.c c ingt es.it yah or.6 CDF CDF 0.028% of posts/comments) and 4chan (0. and platforms share only URLs from mainstream news.n pen ia. t tico iew ge oo tico k ke pos te s ette n o f c 4c itb twit agr il re w r ton gua wik aily gua y t ingt hing the nat was was was (a) (b) (c) Figure 3: Top 20 domains found after resolving archive.0 0. users share more alternative news as just 5% of these users com. 6 . ad revenue) from reaching the original web-page. we report.c dail post. so that users can ac- reasonably well known phenomenon of Twitter bots..jp huff red i.4 0. to Reddit (0. the presence of bots.0 0. Time capsule services are used to generate a short URL We believe the primary reason for this has to do with the pointing to a snapshot of a web-page. between people that rarely share alternative news (frac.com kota om hing vice om ton . While cess the content of that page even if it is later deleted.8 0.com inde wikiped . and what is the sharing We also measure the fraction of news URLs that are al.uk inst book.uk e wik dia.co co o n.

or a tweet URLs occurrence to alternative news compared to the other that has been deleted by the user. which probably signifies the day-to- 5. high- web pages correspond to different slack times. likely to be related to the occurrence of the archive. For Reddit. the 2016 US elections. This indicates that tion infrastructure.2 0. Reddit 4chan Twitter Reddit 4chan Twitter 1. We also study the fraction nificantly faster than 4chan. With the former. comparing URLs pointing to mainstream and alter. 6(c) appears to be an artifact of a failure in our collec- siderably slower than the other domains. first occurrence of a URL and its next occurrences on the same platform. the same platform.com twitter. users are more interested to archive the URL for persistence As some users repost the same URL many times within rather than sharing the content within 4chan. Fig. with the slowest domain being havior and extract insight while comparing platforms.0 0. A similar be. 6(b)) across platforms. debate and the election day itself. sharing behavior is more similar (Fig.0 1.is URLs pointing to 4chan are con. 7. we plot the CDF of the time difference between the ferences between the top domains. In Fig.8 0.6 CDF 0.jp nytimes.2 0.com yahoo.0 1. dissemination of alternative news. In yahoo. point at the 24h period.com huffingtonpost.com twitter. and 4chan. on the date of the first presidential form.1 URLs Occurrence day behavior of news propagation within a platform. 4chan. 6. whereas. We also observe a strong communities (Fig.com facebook. (b) Reddit and (c) Twitter.8 0. 5 overall more “popular” than the former. TEMPORAL DYNAMICS recycled over time within the platform (even after several months). whereas. we do not find any noticeable dif. The Twitter spike in – and find that archive. These findings indicate native news domains. 6(a)).jp.4 0. as the time capsules may be used for persistence.0 0.com reddit. there is an inflection Reddit.4 0.6 0.0 0. two platforms. 6(c)).7 We find that /pol/ and domains. to access a 4chan thread after it is removed.4 0.co. Fig.com washingtonpost.0 1. the presence of both alternative and mainstream news domains. we decide to study such reposting be- havior is observed on Twitter. 7 .is URL within a specific plat. we present the results of a cross-platform tween the first occurrence and the next ones than the other temporal analysis of the way news are posted on Twitter. we measure the daily occurrence of URLs over the three platforms normalized by the average daily number 7 Gaps in the plot correspond to gaps in our dataset due to crawler failure.0 0.com nytimes.6 0.2 0. Fig. for mainstream news.8 0.jp twitter. we also plot lighting that the latter (dominated by mainstream news) are the top 5 domains for each platform separately – see Fig. while Twitter exhibits smaller time differences be- In this section.0 104 105 106 107 108 102 103 104 105 106 107 108 104 105 106 107 108 Difference between time archived and time found (seconds) Difference between time archived and time found (seconds) Difference between time archived and time found (seconds) (a) (b) Figure 5: Time difference of top 5 archived domains in (a) 4chan.6 0. the 6 selected subreddits exhibit a much higher percentage of e. Note that the site itself is among the most popular original of URLs shared in each community.8 0.org washingtonpost.com justpaste.2 0. Finally.it 1. and this is true for both alternative and mainstream news.g. with the latter.0 0.com or.4 0.4 0. To verify if different original between alternative and overall news URLs (Fig.0 103 104 105 106 107 108 103 104 105 106 107 108 Difference between time archived and time found (seconds) Difference between time archived and time found (seconds) (a) (b) Figure 4: CDF of the archival time and the first occurrence of archived URLs pointing to (a) alternative and (b) mainstream domains. 4 plots the slack time between the archival time and There are also some interesting spikes. Reddit is sig.2 0. Both alternative and mainstream URLs are 5.6 CDF CDF CDF 0. Reddit and 4chan that the specific sub-communities are heavily utilized for the exhibit similar times..8 CDF 0.co.com cnn. In all three platforms.

9 plots URLs and 89% of mainstream URLs.695 which portion of URLs appear faster in one platform than the other. if a URL first appears on Reddit and subse- follows the faster pace of Twitter. for URLs with quently on 4chan. it quences. since the switching point is at ∼1 than alternative news. However.938 4.6 0.e. and T→R→4 are the top of platforms we consider. and on Twitter before reports the numbers of involved URLs for each comparison. the sequence is Reddit→ 4chan (R→4). and (c) fraction of alternative news over overall news. We make the following observations. For each URL.015 0. ble 10 reports the distribution of these sequences. and the time they appear for the first time.025 0.00 0. while Table 8 later appear on either Twitter or 4chan. Reddit “outperforms” (i. and 4chan vs Twitter Mainstream 2. Fig. especially in Reddit. for Twitter-4chan compari- 4chan the difference is not evident..301 paring URLs first posted on platform A and then on B. or three platforms. 4chan (/pol/) Reddit (other subreddits) 4chan (/pol/) Reddit (other subreddits) 4chan (/pol/) Reddit (other subreddits) 4chan (other boards) Twitter 4chan (other boards) Twitter 4chan (other boards) Twitter Reddit (6 selected subreddits) Reddit (6 selected subreddits) Reddit (6 selected subreddits) 1. Third. we analyze their related sharing behavior for both mainstream and alterna. son. when the 4chan vs Reddit Mainstream 5. between the distributions (p < 0. Also. for Twitter and hour (resp.382 14.15 0. Comparison Type of News #URLs where #URLs where platform. 8 shows the CDF of the mean inter-arrival time of URLs point being at 1 day (2 days). of URLs only appear on one platform: 82% of alternative form and study the time at which they are shared.020 0. Such a point signifies Alternative 1.10 0. dit than 4chan for 65% (40%) of the time.g. mainstream) news appear faster on Twitter than Red- platforms dit for 80% (resp...e. they pear on all three platforms.005 0. vs.662 lines for the same type of URLs cross). We observe that the majority We now look at URLs that appear on more than one plat. We also study the temporal dynamics of URLs that ap- ing pairs of distributions for a given category of URLs. with these URLs be- mainstream news seem to propagate faster in these platforms ing slower in propagation.” e..2 0. and matches the 24h period Reddit vs Twitter Mainstream 18. when compar. which seems to be consistent across all pairs of platform 1 is faster platform 2 is faster platforms and types of news. Second.232 4.0 Occurrence of mainstream news 0. Interestingly.700 Alternative 778 2. alternative (mainstream) news appear faster on Twitter We also study the inter-arrival time of reposted URLs. given the set of unique URLs across all platforms parison). First. The most alternative news appear on different platforms faster than common sequences are similar for both alternative and main- mainstream news. longer inter-arrival times. Ta- are statistically different. parison. with p-value < 10−4 . whereas.762 11.8 0. Reddit appears to have a duality in re. occurrence on each platform and build corresponding “se- posting behavior: for URLs with small inter-arrival time. as evidenced by the fact that it with respect to the delay between URL appearance on each is at the head of the sequence for 51% and 59% of alternative 8 . we notice the presence of a “turning point” stream and alternative URLs. it follows 4chan. Finally.099 URLs which were posted first in B and then A (i. we find the first overall.455 3. for Reddit-4chan com- that appear more than one time in each platform. 5 hours). Similarly.05 0. Next. first in B and then both other platforms in terms of the speed of sharing main- in A).0 16 6 6 6 6 16 17 7 7 16 6 6 6 6 16 17 7 7 16 6 6 6 6 16 17 7 7 g1 p1 t1 v1 b1 r1 g1 p1 t1 v1 b1 r1 g1 p1 t1 v1 b1 r1 Jul c Jan Jul c Jan Jul c Jan Oc Oc Oc Ma Ma Ma De De De No No No Au Se Fe Au Se Fe Au Se Fe (a) (b) (c) Figure 6: Normalized daily occurrence of URLs for (a) alternative news. both alterna- the CDF of the time difference (in seconds) between the first tive and mainstream URLs tend to appear on Reddit first and occurrence of a URL on pairs of platforms. alternative (respec- Table 8: Statistics of URLs for the comparisons of time difference between tively. and Twitter has smaller mean inter-arrival time in which this happens. first in platform A and then B.01 for each pairwise com. R→4→T. 4chan.. As already mentioned.000 0. and the sequence of appearances 3 sequences. than 4chan for 70% (65%) of the time.2 Cross Platform Analysis platforms in the sequence. i. up to the first two 5. and the order tive URLs.e. Each plat..20 Alternative news fraction 0. This is consistent regardless of the pair stream URLs: R→T→4. with the switching Fig. two. alternative (mainstream) news appear faster on Red- form exhibits unique behavior.4 0.030 Occurrence of alternative news 0. Table 9 reports the distribution of the sequences of appear- ances considering only the first hop. 4chan and Reddit exhibit similar time. with triplets of sequences. with the switching ple KolmogorovSmirnov test showing significant differences point being at 18 hours (12 hours).416 observed earlier. 50%) of the time. there is a cross point when com- Alternative 5. For Twitter-Reddit comparison. Finally. confirmed by a two sam.010 0. appearance in one. (b) mainstream news.

443 (44. E). Twitter.836 (41%) Table 10: Distribution of URLs according to the sequence of first appear- T→4 585 (0. URLs appear first on Twitter more often than Reddit and 4chan. and mainstream URLs. and E the set of first more often on Twitter than Reddit. “4” stands for T→R 3.com.com URL appears first on Twitter dominates in terms of first URL appearance. and (d) all mainstream news URLs.9%) 4→R 1.685 (3. rt.769 (6.2%) 486 (7.6 CDF CDF CDF CDF 0. nytimes.8 0.8%) R→T 4. 4chan. URLs from other G = (V .6 0. Similar to the al- sequences that consider only the first-hop of the platforms.6 0. We also add weights on these edges based the number of such unique URLs.com and theguardian. 4chan is rarely the platform where a URL first spawns.602(46.35%) T only 32.4 0.525 (24. there is no domain where 4chan For example.4%) 1. considering only the first hop. however.3%) 2.com tend to appear first more often on We create two directed graphs.0 1.5%) 552 (8.345 (0.4 0.181 (3. for other popular alternative domains. “R” for Reddit.2%) 290 (4.236 (4.0 1.5%) R→T→4 841 (36.8%) R→4 2.2 0.26%) ance within a platform for URLs common to all platforms.654 (3.com to Twitter.0 0.12%) 4chan.0 0.2 0. dence of how the individual platforms influence the media one can see that breitbart. and sputniknews. we set to provide meaningful evi- stream domains.5%) 861 (0. Table 9: Distribution of URLs according to the sequence of first appearance more often than on Twitter and more frequently than they do within platforms for all URLs. we analyze the source of the URLs for each of the For the mainstream news domains.8 0.5%) 4.1%) T→4→R 192 (8. Also.0 102 103 104 105 106 107 10 2 100 102 104 106 10 2 100 102 104 106 10 3 10 1 101 103 105 107 Mean interarrival time (seconds) Mean interarrival time (seconds) Mean interarrival time (seconds) Mean interarrival time (seconds) (a) (b) (c) (d) Figure 8: CDF for mean inter-arrival time for the URLs that occur more than once for (a) common alternative news URLs.com tend to appear domains. if a breitbart.com. 10 shows the graphs built for alternative and main.com and cnn.2 0.8 0. using graph model and analysis techniques. on 4chan.0 0. and Fig.6 CDF CDF 0. (c) all alternative news URLs.7%) 4→R→T 128 (5. ternative domains graph. Finally.5%) 16.3%) T→R→4 673 (29%) 1.0 1.189 (35.4 0.292 (33.3%) 230.4 0.8%) 1. one for each type of news.307 (2.166 (18. respectively.118 (1.0%) 11. By 6.2 0.0 0. we note that URLs from three platforms. Reddit than Twitter and 4chan. However.8 0.640 (2. where V represents alternative or mainstream domains like bbc. (b) common mainstream news URLs. ences in how news media is shared on Reddit.com.5%) 10. We do so by using a mathemati- 9 . we add an edge from breitbart.2 0.606 (0.0 0. Sequence Alternative (%) Mainstream (%) Sequence Alternative (%) Mainstream (%) 4 only 3. Reddit 4chan Twitter Reddit 4chan Twitter Reddit 4chan Twitter Reddit 4chan Twitter 1.0 1.8 0.9%) 4→T→R 145 (6. as well as the three platforms.4%) 18.0 10 2 10 1 100 101 102 103 104 10 2 10 1 100 101 102 103 104 Time from initial post of link and consecutive appearances later on (hours) Time from initial post of link and consecutive appearances later on (hours) (a) (b) Figure 7: CDF of time difference (in hours) between the first occurrence of a URL and its next occurrences on each platform for (a) alternative and (b) mainstream news.0 0. In this section. Reddit 4chan Twitter Reddit 4chan Twitter 1. INFLUENCE ESTIMATION examining the paths. Comparing the outgoing edges’ thickness.5%) 204.8 0.6 0.com URLs appear first in Reddit shared on other platforms.3%) R only 24.4 0.17%) R→4→T 335 (14. we can discern which domains URLs Thus far.7%) 4→T 315 (0. and from Twitter to Reddit.4 0.2 0. our measurements have shown relative differ- tend to appear first on each of the platforms.964 (5.6 0. and later on Reddit. and “T” for Twitter. such as infowars.

4 0.k . (b) 4chan and Twitter. Figure 11: A depiction of a Hawkes model showing the interaction between This event causes an impulse response on the rates of the events on 3 processes.k is given by: scenario.) Reddit . and multiple events can be cal technique known as Hawkes processes.) 4chan . there would be a natural rate at which URLs will be posted.k = λ0.Reddit (Main.6 CDF CDF CDF 0. For a discrete-time Hawkes model. the same time bin do not interact with each other.Reddit (Alt. they are also af- A fected by each other. our platforms are clearly not λt.2 0.8 0.4 0.1. The three platforms we measure do not exist in a vacuum.Reddit (Alt. however. that each of the platforms was entirely self. and 5.0 0.4 0.4chan (Alt. An ini- tial event 1 is caused by the background rate of process B. we provide caused in response to a single event.Twitter (Alt. Imag.1 5.2 C of events on a Hawkes process with three processes. λt.Twitter (Main. and (c) 4chan and Reddit.2.) 1.Twitter (Alt. The rate contained. Fig. as well as by the greater Web. In such a of each k-th process. First.0 0.0 100 101 102 103 104 105 106 107 108 100 101 102 103 104 105 106 107 108 100 101 102 103 104 105 106 107 108 Time difference (seconds) Time difference (seconds) Time difference (seconds) (a) (b) (c) Figure 9: CDF of the difference between the first occurrence of a URL between (a) Reddit and Twitter. Edges are colored the same as their source node. each with a “background rate” of events λ0.0 1. A Hawkes model consists of a number.0 1.buzz thedailybeast bostonglobe Reddit washingtontimes ctvnews private-eye breitbart theuspatriot nytimes washingtonexaminer usatoday thenewsnerd activistpost rt voanews cnn seattletimes naturalnews Twitter newsexaminer nationalpost Twitter nbcnews veteranstoday theguardian derfmagazine sputniknews thehill theage thelastlineofdefense therealstrategy bbc bloomberg abcnews beforeitsnews infowars mediamass huffingtonpost miamiherald react365 4chan nodisinfo newsbiscuit politicops linkbeef usnews theglobeandmail 4chan time denverpost creambmp politicalears thestar empirenews newshounds startribune dallasnews firebrandleft dailybuzzlive mercurynews reuters christwire disclose times forbes euronews lifezette clickhole cbsnews cnbc (a) (b) Figure 10: Graph representation of news ecosystem (a) alternative news domains and (b) mainstream news domains.) Twitter . of point pro- 1 4 cesses.Twitter (Main. 11 depicts a sequence 3 5.) 4chan . Twitter .0 0. as seen with event 3. 5.3 URL posting rates and internal influence.8 0.2 0. A and C. However. A process can cause an additional impulse response to itself.4chan (Main. causing events 5. huzlers stuppid thetimes witscience dcclothesline nypostfoxnews economist redflagnews now8news chicagotribune aljazeera realnewsrightnow duhprogressive wsj dw libertywritersnews prntly chron news worldnewsdailyreport worldtruth azcentral cbc thedcgazette usapoliticszone newsbreakshere Reddit the-daily.2 0. While they do exhibit their own background k0 =1 t0 =1 10 . and it would be possible to model this using standard K X t−1 X Poisson processes.6 0.6 0.) Twitter . An B event on one process can cause an impulse response on other processes. as seen with event 4 a high level intuition of the analysis.k + st0 . increasing their rates. other processes. time is divided into a but rather within the greater ecosystem of the Web. with a completely disjoint set of users. series of bins of duration ∆t.) 4chan .) Reddit .3.4chan (Alt.) 4chan .4chan (Main. and events occurring within ine.) Reddit .k0 · hk0 →k [t − t0 ] independent. K. eventually causing event 2 on pro- cess A. 2 5.) Reddit .8 0.Reddit (Main.) Twitter .

and tion changes over time: another 5. The probability mass function hour time window. for Twitter. /pol/. and Gk→k0 specifies the probability that a child event will occur 48 hours) gave similar results. To lessen the impact of the missing stream URLs. and weight by the actual number of events that occurred on the contain a large proportion of the total events.1 would mean a URL posted on Twitter or on Reddit is more likely to cause that an event on Twitter will cause n events on /pol/. source platform and dividing by the number of events that 11 . Tests with other values (6. /pol/. which is URLs for mainstream and alternative news sites. caused by each of the other platforms by multiplying the which are more likely to overlap with the missing dates. and where participants on a platform see a URL and re-post it The Donald. Finally. This background rate captures both the “natural” appearance of events (such We now provide more details about our experiments. as well as itself. By setting ∆tmax = 60 · 12 = 720. with Twitter. described in [16]. we also get λ0. For example. After fitting the model. this value URLs are similar to or greater than the ratios for mainstream (WTwitter→Twitter ) would likely be quite high. process k. Here.. up to a maximum lag ∆tmax . and each of the subreddits. be interpreted as the expected number of additional events tion from the first event recorded until the last event. From the Hawkes models for each URL. given that URLs. we will be able to examine whether events. The model Looking at the number of URLs in Table 11.k for each process. 92% of events are in a bin by themselves. Wtwitter→/pol/ = 0. However. meaning that timing interactions hk→k0 [d] = Wk→k0 Gk→k0 [d] between the platforms are not lost. For instance. The missing Twitter data affects 3177 mean weight values over all URLs for alternative and main- (37%) of the URLs. This that will be caused a consequence of an event. we an impulse response function that describes the amplitude create a matrix s ∈ NT ×8 containing the number of events of influence that events on process k 0 have on the rate of (URL posts) per minute for each of the platforms/subreddits.e. and we model each tions between the different platforms and subreddits. and the 6 se. These high ratios explain the high background rates tweets are commonly re-tweeted a number of times: the ini. posted elsewhere). The URL individually.1. this bin size.1 Methodology by other events in the system we model. but only with other events from the same platform or subreddit. 12. the background rate for event arrivals that are not caused 6. We aim to examine how these platforms and subreddits influence each other. also containing the URL. These low us to compare how much influence platforms have on weights can then be interpreted as the expected number of each other. the impulse response function hk→k0 [t − post of the URL on any platform. to the last recorded post of t0 ] can be decomposed into a scalar weight Wk→k0 and a a URL on any platform. which describe behavior than alternative URLs. We study the Hawkes process we measure (where someone posts the URL after seeing it in the subreddit granularity in order to get a better under. as well as the percentage difference between data. where the same URL to be posted on 4chan. that there are substantially more events for mainstream ence all the others. Twitter. and this value can be different for probability mass function Gk→k0 [d]. Using and the probability mass function specifies how the interac. we can esti- increases the smallest amount of Twitter data included in the mate the percentage of events on each platform that were remaining URLs and allows us to keep long-duration URLs. site) as well as those caused by events outside the platforms lected subreddits from Reddit.4% share a bin. we note is fully connected. on the same platform. Next. with a Hawkes model with K = 8 point processes—one 6. we pected number of child events that will be caused on process say that a given event can cause other events within a 12- k 0 after an event on process k. The weight specifies each URL. Since the weight values can overlap any of the missing days with the shortest total dura. have the values for the W matrix – i. i. as someone posting the URL after reading it on the original Once again. T is the number of minutes from the first recorded Following [15]. we obtain the We select URLs that have at least one event in Twitter. weight matrix W which specifies the strength of the connec- /pol/. we remove the 10% of URLs (895) from those that them are presented in Figure 12. we fit Hawkes processes using Gibbs sampling as The weight value Wk→k0 can be interpreted as the ex. For each URL. we consider 4chan (/pol/). n is drawn from a Poisson distribution with rate parameter ference in influence from one platform to another between 0. it is possible for each process to influ. and at least one of the subreddits. interactions between events on different processes. For example.where s ∈ NT ×K is the matrix of event counts (how many The number of remaining URLs and events included for events occur for process k at time t) and hk0 →k [t − t0 ] is each platform are shown in Table 11. the weights of the This interpretation of Wk→k0 is useful because it will al.. 24. We select ∆t = 1minute as a reasonable com- the strength of the interaction from process k to process k 0 promise between accuracy and computational cost. the ratios of events to URLs for alternative on the same platform. or if there is a dif. in posts or tweets. (also in Table 11) for alternative URLs for these platforms tial tweet containing a URL is likely to cause a number of despite the lower number of alternative URL events.e. standing of the various platforms and subreddits. so we model the arrival of URLs. we at each specific time lag d∆t. re-tweets.2 Results each for Twitter.

As.1% both mainstream and alternative URLs.0539 A: 0.0573 M: 0.995 2.0741 A: 0.001502 0.0700 2. This is in part because of the is reversed.0606 M: 0.0623 A: 0.0655 2. The Donald worldnews politics news conspiracy AskReddit /pol/ Twitter URLs Mainstream 3.0555 M: 0. we look at Twitter.0506 M: 0.0551 M: 0.0540 A: 0.644 6.0531 A: 0.0673 1.002330 Alternative 0.2% -1.172 Total 20.0454 A: 0.0575 M: 0.72% of alternative URLs tweeted.0675 10 conspiracy M: 0.0% -5.0569 M: 0.0734 M: 0.0664 news M: 0. and at least one of the subreddits.105 2.001392 0. 5.0547 A: 0.0555 M: 0.0589 A: 0.9% -4.484 586 497 176 7.0507 M: 0.7% -4. /pol/’s influence on The Donald is 12 .725 7.4% 8.1096 for mainstream URLs and 0. A: 0.312 7. A: 0.160 5.0549 M: 0.794 1.5% -4.8% -3.8% P  PT  u∈urls W A→B · t=1 st.0549 A: 0.6% 10 all other weights: 0.8% 1.0578 A: 0.0635 A: 0.0621 M: 0.0672 A: 0.5% 15. Twitter contributes heavily to both types URL is more likely to cause a subsequent post on the other of events on the other platforms—and is in fact the most platforms than the average tweet containing an alternative influential single source for most of the other platforms.5% 6.0625 First.0551 A: 0. for both mainstream URLs and meaning that the average tweet containing a mainstream alternative URLs.0667 30 PctA→B = P PT 9.0607 M: 0. alternative URLs A: 0.9% 3.975 28.0505 M: 0.4% 8.7% 0.0593 M: 0.0580 M: 0. suming the population of The Donald users also reading.0585 M: 0. timated 2.2% -4.589 5.8% -2.1% 3. even though it has lower weights. Looking at the weights for Twitter to the other platforms.228 941 7.0634 A: 0.001265 0. Look along a Twitter to Twitter rate for alternative URLs is much greater row to see outputs and down a column to see inputs.523 3.4% ing given the large number of users on the platform.250 Alternative 7.0471 A: 0.0579 A: 0. Mean Weights .0494 AskReddit M: 0.0639 /pol/ M: 0.0617 M: 0.0570 A: 0.000501 0.8% -9. The Donald and /pol/ also have a strong weights implies that the users have a stronger preference influence on the alternative URLs that get posted on other for re-posting alternative URLs back to The Donald than for platforms.7%.0624 A: 0. which is not surpris. on all greater alternative URL weights for all of its inputs.517 26.B A: 0.0588 M: 0.0626 M: 0.008 252 813 362 321 100 2.0634 ues for WTwitter→Twitter are also substantially higher than 1.7% -3.0592 A: 0.0540 A: 0.0521 M: 0. alternative URL events are on the twitter platform.946 1.0598 0 23.001382 0.0561 M: 0.9% for alternative URLs.0677 politics M: 0. we observe that The Donald causes 8% of /pol/’s alterna- The Donald is also.2% -3. it actu- other platforms are The Donald and /pol/.1% -16.5% 7.302 19.1% 6.0570 A: 0.068 59.0561 M: 0.0614 A: 0. and the percent increase/decrease between There are different possible explanations for why the mainstream and alternative (also indicated by the coloration).0610 M: 0. ally has a greater influence on alternative URLs than main- ing that The Donald is the only platform/subreddit that has stream URLs. despite the higher weights for alternative URLs.7% -0.0587 M: 0.0576 M: 0.478 27.422 Mean λ0 Mainstream 0. The next platforms most likely to cause events on spite Twitter’s lower weights for alternative URLs.3% 0.5% -18.0536 M: 0.0584 A: 0.0569 M: 0.0624 A: 0.0603 M: 0.000034 0.0% -13.0637 M: 0.0443 A: 0. However.322 23. /pol/.0558 M: 0.0758 M: 0. A: 0.0623 A: 0.2% 8.0571 M: 0.4% -12.725 Events Mainstream 12. interestingly. the largest proportion of say. while /pol/’s influence on The Donald is less.0607 A: 0.0563 M: 0.0761 A: 0.0596 A: 0.2% Source and the difference between them are presented in Figure 13.9% -0.6% -3. For the mainstream URLs the strength of influence with the exception of Twitter. which implies that a lot of the alternative might be more inclined to re-tweet the URL [11]. the other platforms or subreddits.8% 9.2% -1.0549 A: 0.1% 13.8% -17.136 Total 5.Percent Increase/Decrease of Alternative URLs over Mainstream URLs occurred on the destination platform: The_Donald worldnews politics news conspiracy AskReddit /pol/ Twitter A: 0.0606 A: 0.0% 1. and the mean background rate (λ0 ) for each platform/subreddit.097 2. the mean weights for mainstream URLs (M). The Donald has a stronger effect for alternative mainstream URLs.0652 A: 0.775 4.7% 12.0459 A: 0.0563 M: 0.2% 9.0551 M: 0.001564 0.0534 A: 0.0580 M: 0. ent platforms on each other.0532 A: 0.0521 M: 0.0588 M: 0.0644 A: 0.0583 A: 0. than the rate for mainstream URLs.578 2.000107 0.000423 0.0558 A: 0.6% 41.000619 0. where most of the input weights are stronger for the largest alternative influence on Twitter.001627 0. but The Donald re-tweeting.002803 Table 11: Total URLs with at least one event in Twitter. It is worth not. De- URL.0546 A: 0. influenced more strongly tive URLs.0598 A: 0.4% -2.0596 M: 0.0594 M: 0.584 907 841 5. The val.000696 0. worldnews is the same for both alternative and main.0559 M: 0.4% -2.0622 M: 0.0556 M: 0.109 7.0533 M: 0. Figure 13 shows the estimated total impact of the differ- all except The Donald are greater for mainstream URLs.1554 A: 0. at by mainstream URLs than alternative URLs on all platforms.0581 M: 0.4% 20 These percentages for mainstream URLs. in terms of percentage of events caused.0797 40 The_Donald M: 0.0615 M: 0.4% 3.7% 0.1554 Twitter M: 0.391 2.6% 5.001525 0.4% -17. This reflects the ease and common Destination practice of re-tweeting: a URL in a tweet is likely to gener- ate other events as users re-tweet it.1096 4.0694 M: 0. The first is bot activity— if automated Twitter bots are used to spread alternative URLs.6% -6.0501 M: 0.0522 M: 0.0526 A: 0.0549 A: 0. URLs on the platform are coming from other sources.0440 A: 0. Interestingly.7% 16.1% -13.380 2.0647 worldnews M: 0.0% -1.0715 A: 0.2% -5. they mainstream URLs.0557 A: 0.2% -7.0680 A: 0. Background rates are high for 5.0% u∈urls t=1 st. causing an es- mainstream news.0550 M: 0.492 2.797 458 2.0554 A: 0.2% -2.0562 A: 0.7% -3.0720 M: 0.0558 M: 0.8% -9.0665 A: 0.3% 10.0591 M: 0. total events for mainstream and alternative URLs.589 Alternative 2. The opposite can be seen for worldnews URLs on all platforms except Twitter—although it still has and politics. Figure 12: The mean weights for alternative URLs (A).3% -5.0652 A: 0. it could result in a much higher rate of tweeting and greater number of mainstream URL events.A A: 0.0566 A: 0. Specifically.000553 0. stream URLs—which is reasonable—then the difference in After Twitter.0579 A: 0.0600 M: 0.0629 A: 0.0549 A: 0.136 2. This is due to the fact that.0551 A: 0.0577 A: 0. Another possible explanation is the behavior of also has a higher background rate for alternative URLs than users who read news stories from alternative sources.0640 M: 0.746 36.6% 3.

13 . as well as Natural Language Pro- The Donald (2. Bureau of Economic Research. first. For alternative URLs.04% 5 -0..15 0.26 5. and to estimate a quantifiable influence between them.45 -3.43 -2. In CSCW.41% /pol/ M: 8.56 -0.62% A: 0.05 To the best of our knowledge. 4chan: The Skeleton Key to the Rise of Using Hawkes processes.04 2.55% less-influenced by the other platforms than 4chan.67% M: 7. that Twitter CSCW. [1] H. 2017.28% M: 6. E. unsurprisingly. and conspiracy (1.92% A: 3.72%) and /pol/ (1. Y.66 -0.83% M: 12.30% A: 6.86% A: 6.63% M: 9. Twitter by far has the most influence in terms of the The_Donald M: 5. yet mean. Gentzkow. interplay between platforms manifests in subtle.07% A: 20. communities share mainstream and alternative news sources with a particular focus on how communities influence each 8. Acknowledgments. Beran. with 4chan posts lagging behind K. Shanahan.49 -1. we explored how mainstream and fringe Web (Grant Agreement No.64% M: 11. The Donald subred- news M: 3.03% M: 17. our analysis constitutes the A: 14.25% A: 15.30% M: 4.04%).85 -0.52% M: 7. 4chan occurring via screenshots. For instance.33% M: 4.30% M: 5.19 -0.01% there come from other sources. there may also be a lot of direct information transferred from 8.72% 20 dits.44% M: 4.29%). and the influence these two paigns.57 A: 0.73% M: 1. news platform’s narrative of events – i.97%).39 -1.94 19.74% M: 3.13% M: 2.76% A: 16.09% A: 1.86% M: 8. Spiro.02% A: 1. cessing to determine whether stories become a part of the tics (1.77% A: 11. Y.61% M: 2.31% M: 3. we plan to explore advanced image AskReddit (1. 691025).01%). E.50%).00% M: 15.60%).67% A: 18.35% A: 3. worldnews (0.10% other platforms to a much greater degree than to the share politics M: 9. and our work constitutes a first step in that direction. and Reddit users tend to post the same stories within a rela. techniques against misinformation and disinformation cam- tive at propagating information.00 0.28% A: 37.Pct.74% M: 0.74% -2.10 -0. After Twitter.-J. due to its ephemeral nature. Starbird.21% M: 1.39 -1.09 7.32% A: 27. Chou. AskReddit (0.00% M: 1.21% A: 1. when a story becomes popular after snowballs: Exploring the role of exposure in online a day or two. followed by poli.69% M: 14.21% A: 4. K. we also modeled the influence Trump.80% M: 3.32% M: 8. -2. news (1. impact of’official’accounts on online rumoring. We also explored complex tem.30 1.60% number of URLs it causes to be posted to other platforms. Social media and fake dit. Arif.13 0.41 1. We believe efforts into understanding how the grow- largest alternative URL inputs to Twitter are The Donald and ing phenomenon of alternative information sources affects /pol/.91 6.63% A: 0.54% Twitter M: 10.14% M: 1.00% A: 1. For example. and has a AskReddit M: 1. DISCUSSION & CONCLUSION gram under the Marie Skłodowska-Curie ”ENCASE”project In this work. F.79 0 A: 5.79% 5 first attempt to characterize the dissemination of mainstream 3.32 6.61% M: 6.94% M: 0. Spiro.86% A: 7. ties of diffused information. However. https://medium.com/@DaleBeran/ the individual platforms have on each other. and 4chan and analyzed the occurrence and temporal news in the 2016 election.17% M: 6.75% A: 11.80% M: 3. Starbird.24% A: 1.43 9. the estimated mean percentage of urally.01% M: 6.91 0.13% A: 13.16% A: 1.00 -0.28 -2.31 dit and 4chan are the next most influential when it comes Source A: 1. meaning more of the URLs posted -0.15% M: 15. to talk about stories without actually posting a relevant URL ternative URLs on other platforms to a large degree—but the itself.30% A: 4. our work has some limitations.00% A: 2.30 -3.99% A: 1. and platforms prefer different news sources.87 -0.80% M: 2. The Donald is conspiracy M: 1.34%).65% A: 0.25 A: 1.96%). Dosouto. of Mainstream URLs The_Donald worldnews politics news conspiracy AskReddit /pol/ Twitter ingful ways. Ding. worldnews M: 3. National dynamics of news shared from 45 mainstream and 54 al. 2017.55%). for example.57% M: 19. platforms have on Twitter would likely spread widely.46%). S. and the at the posting of third party URLs. [4] D.97% 11.07% M: 2.15% A: 0. Red.42% A: 14. we miss other modali- difference between alternative and mainstream (also indicated by the col. As part of future work.22 -6.36 12. and E. In descending order.81% -2.e.16% M: 9. that time-capsuled links to 4chan were present on Reddit. it is usually the case it was posted on 4chan rumor propagation.34% higher background rate.58% M: 2. Pct.29% -6. but Twitter is undoubtedly effec.74%).74 10. REFERENCES other. ternative news sites.50% A: 1.53 18. We collected millions of posts from Twitter.70% A: 12. of Alternative URLs .94 4.80% A: 12.96% M: 3.81%). Fichet.25% A: 18. Keeping up with the tweet-dashians: The comes to alternative ones.95% A: 34. the different platforms.34 -2. especially when it K.45 3.52% A: 5.92 0.50% 10 of mainstream URLs.79% M: 9. S. We found that users on the different [2] C. finding that the 4chan-the-skeleton-key-to-the-rise-of-trump-624e7cb798cb.75% M: 1. worldnews (2. while we found oration). Nat- Figure 13: The estimated mean percentage of alternative URL events caused by alternative URL events (A).95% M: 4.55 4.86% A: 1.31% M: 11.95 3. whether users continue (0.68% A: 20.54 -0.12% A: 17. The recognition techniques to look for screenshots shared among strongest influences for alternative URLs are. How information both of them. The Donald (2.79% A: 5.13 21. Since we only look mainstream URL events caused by mainstream URL events (M).25 1.34% M: 7. In poral dynamics and we discovered. [3] A.72% A: 3.96% A: 6.75 Destination and alternative news across multiple social media platforms. We are only looking at a closed system of 8 different multiple platforms can help inform detection and mitigation platforms and subreddits.01% A: 20.81% M: 1.10%).56 3. Twitter influences the al. This research is supported by the Eu- ropean Union’s Horizon 2020 research and innovation pro- 7.13 6. /pol/ (3. Allcott and M.16% M: 9.46% to alternative URLs.37% A: 4.33% M: 6.18 -3. we did not examine the content of the news stories mainstream URLs are politics (4.19 A: 1.81 -1. 2016.24% M: 8. shared.09 -0. of all the platforms and subred- A: 16. 2016.68% M: 3. Technical report.79% A: 12.12% A: 5.27% A: 8.61% whereas The Donald’s influence on /pol/ is 6.61 1.17% M: 3.13%.14 15 and contributes to the share of alternative news URLs on the A: 2.09% A: 0. Andrews.19 -0. Look along a row to see outputs and down a column to see inputs.39 12. and conspiracy (0. tively short period of time. the influences on Twitter for Also.66% A: 6.

In WWW. Proceedings the culprit? IEEE Transactions on information theory. Sources on twitter. Incorrect Forum and Its Effects on the Web. Ferrara. Working Paper No. G. Online human-bot interactions: [13] S. Kwon. Petroni. W. Shah and T. Menczer. and Digital and J. and Detection of ICWSM. Adamic. Achterman. Metaxas. In ICDM. Nato accuses Sputnik News of suspect? In MILCOM. F. Rumor [14] S. Kourtellis. [9] S. Agrawal. E. 2016. [24] K. Discovering Latent volume 42. of the National Academy of Sciences. Del Vicario. C. Thai. Under Crisis: Can we trust what we RT? In SOMA. Menczer. F. Blackburn. 2012. R. rumors and their sources in social networks. Disinformation Detection. Kek. A. [11] A. 2011. Chen. and A. 2017. Kumar. El Abbadi. W. Wang. WWW. [5] C. Cha. T. Abdelzaher. Adams. and ICWSM. A. Nguyen. E. [20] D. Tan. In ACM social media. Ciampaglia. and A. Flammini. Linderman and R. 2014. N. W. Adams. Zollo. Onaolapo. P. K. Samaras. Gupta. Wang. Stringhini. In SPIE. and N. Examining the alternative media Faking sandy: characterizing and identifying fake ecosystem through the production of alternative images on twitter during hurricane sandy. 2017. 113(3). not binary: Studying abusive behavior of #gamergate [18] D. Maddock. Characteristics. [15] S. and T. Scala. Zhang. Flammini. Wikipedia Hoaxes. T. Starbird. Poblete.org/wiki/Pizzagate conspiracy theory. In WWW. M. 2012. 2014. Quattrociocchi. and Investigating Rumor Propagation with TwitterTrails. and E. ICWSM. Budak. 2014. In WebSci. W. De Cristofaro. 2017. F. Castillo. De Cristofaro. [17] M. R. Mohapatra. In 2013. False Flags.03228. In iConference. B. and Y. 14 . J. [26] Z. 2016. BFI Rumor cascades. Limiting the Inference for Excitatory Point Process Networks. and M. D. Cucks. 2016. [16] S. 2017. R. Finn. P. Situngkir. https: 2014. source detection with multiple observations: Prominent features of rumor propagation in online Fundamental limits and algorithms. 2013. Dearden. 2014. 2011. Leskovec. Lamba. and A. narratives of mass shooting events on twitter. 57(8). E. [21] C. Davis. [8] M. distributing misinformation as part of ’Kremlin [19] E. Bessi. Hate is 2010. J. M. W. Joshi. [12] G. Twitter [6] D. Mustafaraj.3550. //en. Starbird. N. Orand. T. Eckles. D. Dong. [27] Wikipedia. Zaman. A. and C. Kourtellis. J. Caldarelli. Chatzakou. P. Kumaraguru. H. Network Structure in Point Process Data. Friggeri. Nguyen. Shao. Leontiadis. and W. Cheng.wikipedia. Stringhini. and characterization. 2015. M. 2011. spread of misinformation in social networks. G. Vakali. In on the Web: Impact. W. [10] A. West.pn/2luLjs0. SIGMETRICS Performance Evaluation Review. F. and J. L. Mason. In ArXiv pre-print 1507. Blackburn. Rumors in a network: Who’s The spreading of misinformation online. Identifying propaganda machine’. Rumors. and God Emperor Vigilantes: Misinformation on Twitter After the 2013 Trump: A Measurement Study of 4chan’s Politically Boston Marathon Bombing. In WWW Companion. Hine. G. Jung. and C. E. [22] H. P. WP-4-2011. A. In ICWSM. In ICML. of misinformation in online social networks: Who to [7] L. A. P. In [25] O. L. Linderman and R. Hoaxy: A Platform for Tracking Online arXiv preprint 1411. P. Misinformation. [23] K. P. Seo. http://ind. H. A. E. and J. Spread of Hoax in Social Media. Scalable Bayesian 2017. Varol. Mendoza. I. Pizzagate conspiracy theory. 2016. Stanley. estimation. G.