You are on page 1of 36

Frontiers of

Computational Journalism
Columbia Journalism School
Week 5: Social Filtering
October 9, 2015

User

stories not covered


x

x
x
x

x
x

x
ltering

User

who user chooses to follow =


social ltering
x
x
x
x
x

Twi>er follower network


We have crawled the entire Twitter site and obtained 41.7
million user profiles, 1.47 billion social relations, 4, 262
trending topics, and 106 million tweets. In its followerfollowing topology analysis we have found a non-powerlaw follower distribution, a short effective diameter, and low
reciprocity, which all mark a deviation from known
characteristics of human social networks
- Kwak et. al, What is Twitter, a Social Network or a
News Media?

More followings than followers

Small avg distance between nodes

Its a news network - hubs

Its a news network



Small number of high-degree hubs

Dierent network structure than e.g. Facebook.

Dierent uses.

why?

- Zynep Tufekci, What Happens to #Ferguson Aects Ferguson:


Net Neutrality, Algorithmic Filtering and Ferguson

data from
SocialReach,
who works with
many publishers

John McDermo>, Why Facebook is for ice buckets, TwiBer is for Ferguson

- Sunita, Why #Ferguson broke out on TwiBer, not Facebook

Information ow on Facebook

Finding sources on social media

Classify Users
Classic machine learning problem. Classify each user
as one of:
journalist/blogger
organization
ordinary individual
First, need to encode as a vector / select features...

Features for user classier

# of followers / following
# of posts, favorites
percentage of posts that are RTs, @replies, links
presence/absence of named entities
topic distribution of tweets (IPTC top level topics)

Digression: IPTC Media Topic Codes


International standard hierarchical taxonomy, part of
the NewsML markup system. Defined by Reuters, AP,
NYTimes...

K-nearest neighbor classier

Take K closest training points (in high dimensional


feature space), choose majority label.

Creating the training data


1,850 random users
1,532 known organizations
1,490 known journalists and bloggers
Hired Mechanical Turk workers to apply labels. Each
user labeled by two workers, discarded if
disagreement.

Classier Accuracy

Eyewitness classier

Goal is to find individual tweets that are eyewitness reports.

Started with LIWC (linguistic inquiry and word count)


dictionary that classifies English words along 70 different
dimensions, including emotion, cognition, time, health...

Word Aspects

Used perception category words


plus insight and certainty words

Eyewitness tweet classier


Its an eyewitness tweet if it contains any of these
special words! (or their stems)
High precision! Low recall.
89% of tweets classified as eyewitness actually were.
But only 32% of eyewitness tweets detected.

Other dimensions

Tweet contains URL to photo or video (used table of domain


names, e.g. flickr.com = photo)
Posted from mobile device (from tweet metadata naming
posting app)
Geocode users stated location (this is painful and unreliable)
Distribution of friends locations. (Friend = mutual following)

Test user reactions

This gives you context you have the context for whether or not
you think theyre reputable or whether or not theyre worth
reaching out to.
Its giving me a lot of context which is really useful when youre
trying to verify if someone is reputable or not.
I would tend to focus on the eyewitnesses and journalists/
bloggers. Eventually Id look at everyone else but Id want to start
my search with those two groups because they would normally
provide me with the most information.

Test user reactions


Popular features:
Eyewitness filtering, user location, image/video filter

Unpopular features:
Entity extraction not helpful, no ability to filter by location and eyewitness
status, focus on users instead of content

Social Software
Basic assumption: structure of software influences how
groups use it.

or: architecture influences behavior

Three ways to inuence behavior


Norms: culture, habits, etiquette, the users sense of
what is right or appropriate
Laws: rules enforced by the administrator
Code: what it is actually possible to do

Design problem...
What do we want the users to accomplish together?
How do we encourage this?
We can write the code, but the culture is a separate
issue.

You might also like