Frontiers of

Computational Journalism
Columbia Journalism School
Week 3: Filters as Editors

September 22, 2016
This class
• Social filtering
• The Twitter network
• Filtering News On Twitter
• Problems with Filters
• Human-machine filtering
• The filter design problem
Social Filtering
who user chooses to follow =
social filtering

x

x
x

x
x
The Twitter Network
Twitter follower network
We have crawled the entire Twitter site and obtained 41.7 million
user profiles, 1.47 billion social relations, 4, 262 trending topics,
and 106 million tweets. In its follower-following topology analysis
we have found a non-power-law follower distribution, a short
effective diameter, and low reciprocity, which all mark a
deviation from known characteristics of human social networks

Kwak et. al, What is Twitter, a Social Network or a News Media? (2010)
More “followings” than followers
Small average distance between nodes
It’s a news network - hubs
Twitter vs. Newswire timings

Petrovic et. al,
Can Twitter replace newswire
for breaking news?
(2011)
It’s a news network

Small number of high-degree hubs

Different network structure than e.g. Facebook.

Different uses.

why?
- Zynep Tufekci, What Happens to #Ferguson Affects Ferguson:
Net Neutrality, Algorithmic Filtering and Ferguson
data from
SocialReach,
who works with
many publishers

John McDermott, Why Facebook is for ice buckets, Twitter is for Ferguson
Sunita, Why #Ferguson broke out on Twitter, not Facebook
Information flow on Facebook
Filtering News on Twitter
Reuters News Tracer

Score
Cluster into Searches
Filter veracity &
events and Alerts
newsworthy
Liu et. al, Reuters Tracer: A Large Scale System of Detecting &
Verifying Real-Time News Events from Twitter
Liu et. al, Reuters Tracer: A Large Scale System of Detecting &
Verifying Real-Time News Events from Twitter
Liu et. al, Reuters Tracer: A Large Scale System of Detecting &
Verifying Real-Time News Events from Twitter
Problems with Filters
The Echo Chamber
[Echo chambers are] those Internet spaces where like-minded
people listen only to those people who already agree with them.
...
While most of us had assumed that the Internet would increase
the diversity of opinion, the echo chamber meme says the Net
encourages groups to form that increase the homogeneity of
belief. This isn’t simply a factual argument about the topography
carved by traffic and links. A “tut, tut” has been appended: See,
you Web idealists have been shown up — humankind’s social
nature sucks, just as we always told you!

- David Weinberger, Is there an echo in here?
Graph of political book sales during 2008 U.S. election, by orgnet.org
From Amazon "users who bought X also bought Y" data.
Retweet network of political tweets.
From Conover, et. al., Political Polarization on Twitter
Instagram co-tag graph, highlighting three distinct topical communities: 1) pro-Israeli
(Orange), 2) pro-Palestinian (Yellow), and 3) Religious / muslim (Purple)
Gilad Lotan, Betaworks
The Filter Bubble
What people care about politically, and what they’re motivated to do
something about, is a function of what they know about and what they
see in their media. ... People see something about the deficit on the
news, and they say, ‘Oh, the deficit is the big problem.’ If they see
something about the environment, they say the environment is a big
problem.

This creates this kind of a feedback loop in which your media influences
your preferences and your choices; your choices influence your media;
and you really can go down a long and narrow path, rather than
actually seeing the whole set of issues in front of us.

- Eli Pariser,
How do we recreate a front-page ethos for a digital world?
The (Algorithmic) Filter Bubble

If we try to present stories that the user will want to
click on... do we end up only telling people what they
want to hear?

If an algorithm only shows us things our friends like, will
we ever see anything that challenges us?
Information diet
The holy grail in this model, as far as I’m
concerned, would be a Firefox plugin that
would passively watch your websurfing
behavior and characterize your personal
information consumption. Over the course of a
week, it might let you know that you hadn’t
encountered any news about Latin America,
or remind you that a full 40% of the pages you
read had to do with Sarah Palin. It wouldn’t
necessarily prescribe changes in your
behavior, simply help you monitor your own
consumption in the hopes that you might
make changes.

- Ethan Zuckerman,
Playing the Internet with PMOG
Information and Disinformation

The five clusters of users who made #TrumpWon trend after first presidential
debate. Gilad Lotan, 2016
Human-Machine Filters
Different Filtering Systems
Content:
Newsblaster analyzes the topics in the documents.
No concept of users.

Social:
What I see on Twitter determined by who I follow.
Reddit comments filtered by votes as input.
Amazon "people who bought X also bought Y” - no content analysis.

Hybrid:
Recommend based both on content and user behavior.
TechMeme / MediaGazer
Facebook trending (with editors)
Facebook trending (without editors)
Facebook “trending review tool” screenshot from leaked documents
Filter Design
Item Content My Data Other Users’ Data

what I’ve read/liked

Text analysis,
topic modeling,
clustering...

social network
structure,
who I follow
other users’ likes
Filter design problem
Formally, given
U = user preferences, history, characteristics
S = current story
{P} = results of function on previous stories
{B} = background world knowledge (other users?)

Define
r(S,U,{P},{B}) in [0...1]

relevance of story S to user U
Filter design problem, restated
When should a user see a story?

Aspects to this question:
normative
personal: what I want
societal: emergent group effects
UI
how do I tell the computer I want?
technical
constrained by algorithmic possibility
economic
cheap enough to deploy widely
How to evaluate/optimize?
How to evaluate/optimize?
• Netflix: try to predict the rating that the user gives a
movie after watching it.

• Amazon: sell more stuff.

• Google web search: human raters A/B test every
change
How to evaluate/optimize?
• Does the user understand how the filter works?
• Can they configure it as desired?
• Can they correctly predict what they will and won't see?
• Controls for abuse and harassment
• Can it be gamed? Spam, "user-generated censorship,"
etc.