You are on page 1of 54

Fron%ers

of Computa%onal Journalism
Columbia Journalism School Week 1: Basics September 10, 2012

Week 1: Basics
What is computa%onal journalism? Data in journalism Aims of the course Course structure

Week 1: Basics
What is computa%onal journalism? Data in journalism Aims of the course Course structure

Computa%onal Journalism: Deni%ons


Broadly dened, it can involve changing how stories are discovered, presented, aggregated, mone%zed, and archived. Computa%on can advance journalism by drawing on innova%ons in topic detec%on, video analysis, personaliza%on, aggrega%on, visualiza%on, and sensemaking. - Cohen, Hamilton, Turner, Computa(onal Journalism

Computa%onal Journalism: Deni%ons


Stories will emerge from stacks of nancial disclosure forms, court records, legisla%ve hearings, ocials' calendars or mee%ng notes, and regulators' email messages that no one today has %me or money to mine. With a suite of repor%ng tools, a journalist will be able to scan, transcribe, analyze, and visualize the paRerns in these documents. - Cohen, Hamilton, Turner, Computa(onal Journalism

Cohen et al. model

Data

Repor%ng User

Computer Science

CS for presenta%on / interac%on

CS CS

Data

Repor%ng User

Filter many stories for user


CS CS

Data

Repor%ng

CS

CS

CS

Data

Repor%ng

Filtering
User

CS

CS

Data

Repor%ng

Examples of lters
What an editor puts on the front page Google News Reddits comment system TwiRer Facebook news feed Techmeme

Memetracker by Leskovic, Backstrom, Kleinberg

Kony 2012 early network, by Gilad Lotan / Socialow

Track eects
CS CS

Data

Repor%ng

CS

CS

CS

CS

Data

Repor%ng

Filtering User

Eects

CS

CS

Data

Repor%ng

Computa%onal journalism process


Repor%ng Presenta%on Filtering Tracking

Computa%onal Journalism: Deni%ons


the applica%on of computer science to the problems of public informa%on, knowledge, and belief, by prac%%oners who see their mission as outside of both commerce and government. - Jonathan Stray, A Computa(onal Journalism Reading List

Week 1: Basics
What is computa%onal journalism? Data in journalism Aims of the course Course structure

Deni%on of data a collec%on of similar pieces of informa%on

structured data

unstructured data

Why use data in journalism?


1. data is where the informa%on is

More video on YouTube than produced by TV networks during en%re 20th century

10,000 legally-required reports led by U.S. public companies every day

400,000,000 tweets per day AP moves ~15,000 stories per day 390,000 Wikileaks cables 500,000 Enron emails how many govt and corporate docs?

Theres a lot out there


Human data generated in 2010 = 1,000,000,000 terabytes Library of congress digital archive = 160 terabytes
(only 20 TB for all books!)

All New York Times ar%cles ever = 0.06 terabytes (13 million stories, assuming 5k per story)

Transparency means nothing if no one is watching.

Why use data in journalism?


1. Data is where the informa%on is 2. Data can give a more complete picture

Phil Meyer, Detroit Riots, 1967


A reporter, talking to people on the street corner, draws comparisons intui%vely, almost unconsciously. When dealing with large numbers of people437 were interviewed in the Detroit surveyintui%on is not enough. It takes a computer to count and sort and analyze the thoughts of that many people, and the input must be consistently structured.

Phil Meyer, Detroit Riots, 1967


Educa%on and income were not good predictors of whether a person would riot.

Week 1: Basics
What is computa%onal journalism? Data in journalism Aims of the course Course structure

Design
[Designers] are guided by the ambi%on to imagine a desirable state of the world, playing through alterna%ve ways in which it might be accomplished, carefully tracing the consequences of contemplated ac%ons.
- Horst RiRel, The Reasoning of Designers

Design is not objec%ve


During the industrial age, the idea of planning, in common with the idea of professionalism, was dominated by the pervasive idea of eciency. We have come to think about the planning task in very dierent ways in recent years. We have been learning to ask whether what we are doing is the right thing to do. That is to say, we have been learning to ask ques%ons about the outputs of ac%ons and to pose problem statements in valua%ve frameworks.
- Horst RiRel, Dilemmas in a General Theory of Planning

Design is poli%cal
No plan has ever been benecial to everybody. Therefore, many persons with varying, oten contradictory interests and ideas are or want to be involved in plan-making. The resul%ng plans are usually compromises resul%ng from nego%a%on and the applica%on of power. The designer is party in these processes; he takes sides.
- Horst RiRel, The Reasoning of Designers

Dierent kinds of knowledge


Norma%ve: what should be
(poli%cal philosophy, sociology, ethics, cri%cal theory)

Instrumental: how to get there


(in our case: journalism and computer science)

This course is about both.

Week 1: Basics
What is computa%onal journalism? Data in journalism Aims of the course Course structure

Theory
We will learn important guiding principles about Filter design Visualiza%on Social network analysis Drawing conclusions from data Security modeling

Techniques
We will discuss a handful of techniques in great depth. Distance func%ons and clustering Vector space document model Recommender systems Proposi%on extrac%on Knowledge representa%on as linked data Community detec%on Any requests?

Course structure
Classes: well review the readings (so please read them) By next week: form groups of 2-3. Assignments every other week, due in two weeks Some involve will involve coding, all will involve cri%cal analysis.

Your data
You are encouraged to pick a data set and s%ck with it. If you want, can do all assignments, nal research report, etc. with this data This is a research course lets learn something new.

What data?
SEC reports, municipal open gov data, Wikileaks, your favorite archive, social media Two criteria: Journalis%cally interes%ng Requires advanced techniques

Final Report
For 3-point students A theore%cal discussion (10 pages) For 6-point students, one of: A theore%cal discussion (25 pages) An implementa%on of a technique and discussion of results Analysis of your chosen data A completed story, plus methodology

You might also like