Fron%ers  of     Computa%onal  Journalism  

Columbia  Journalism  School     Week  1:  Basics   September  10,  2012      

Week  1:  Basics  
  What  is  computa%onal  journalism?     Data  in  journalism     Aims  of  the  course     Course  structure      

Week  1:  Basics  
  What  is  computa%onal  journalism?     Data  in  journalism     Aims  of  the  course     Course  structure      

Computa%onal  Journalism:  Defini%ons  
“Broadly  defined,  it  can  involve  changing  how   stories  are  discovered,  presented,  aggregated,   mone%zed,  and  archived.  Computa%on  can   advance  journalism  by  drawing  on  innova%ons   in  topic  detec%on,  video  analysis,   personaliza%on,  aggrega%on,  visualiza%on,  and   sensemaking.”      -­‐  Cohen,  Hamilton,  Turner,  Computa(onal  Journalism  

Computa%onal  Journalism:  Defini%ons  
“Stories  will  emerge  from  stacks  of  financial   disclosure  forms,  court  records,  legisla%ve  hearings,   officials'  calendars  or  mee%ng  notes,  and   regulators'  email  messages  that  no  one  today  has   %me  or  money  to  mine.  With  a  suite  of  repor%ng   tools,  a  journalist  will  be  able  to  scan,  transcribe,   analyze,  and  visualize  the  paRerns  in  these   documents.”      -­‐  Cohen,  Hamilton,  Turner,  Computa(onal  Journalism  

Cohen  et  al.  model  

Data  

Repor%ng   User  

Computer   Science  

CS  for  presenta%on  /  interac%on  

CS   CS  

Data  

Repor%ng   User  

Filter  many  stories  for  user  
CS   CS  

Data  

Repor%ng  

CS  

CS  

CS  

Data  

Repor%ng  

Filtering  
User  

CS  

CS  

Data  

Repor%ng  

Examples  of  filters  
•  •  •  •  •  •  •  What  an  editor  puts  on  the  front  page   Google  News   Reddit’s  comment  system   TwiRer   Facebook  news  feed   Techmeme   …  

Memetracker  by  Leskovic,  Backstrom,  Kleinberg    

Kony  2012  early  network,  by  Gilad  Lotan  /  Socialflow  

Track  effects  
CS   CS  

Data  

Repor%ng  

CS  

CS  

CS  

CS  

Data  

Repor%ng  

Filtering   User  

Effects  

CS  

CS  

Data  

Repor%ng  

Computa%onal  journalism  process  
  Repor%ng   Presenta%on   Filtering   Tracking    

Computa%onal  Journalism:  Defini%ons  
“the  applica%on  of  computer  science  to  the   problems  of  public  informa%on,  knowledge,  and   belief,  by  prac%%oners  who  see  their  mission  as   outside  of  both  commerce  and  government.”      -­‐  Jonathan  Stray,  A  Computa(onal  Journalism  Reading  List  

Week  1:  Basics  
  What  is  computa%onal  journalism?     Data  in  journalism     Aims  of  the  course     Course  structure      

Defini%on  of  data     a  collec%on  of  similar  pieces  of   informa%on  

structured  data  

unstructured  data  

Why  use  data  in  journalism?  
1.  data  is  where  the  informa%on  is    

More  video  on  YouTube  than  produced   by  TV  networks  during  en%re  20th  century  

10,000  legally-­‐required  reports  filed  by   U.S.  public  companies  every  day  

400,000,000  tweets  per  day     AP  moves  ~15,000  stories  per  day     390,000  Wikileaks  cables     500,000  Enron  emails       …how  many  gov’t    and  corporate  docs?    

There’s  a  lot  out  there  
  Human  data  generated  in   2010  =   1,000,000,000  terabytes     Library  of  congress  digital   archive  =     160  terabytes  
(only  20  TB  for  all  books!)    

All  New  York  Times  ar%cles   ever  =   0.06  terabytes   (13  million  stories,  assuming  5k  per  story)        

Transparency  means  nothing   if  no  one  is  watching.  

Why  use  data  in  journalism?  
1.  Data  is  where  the  informa%on  is   2.  Data  can  give  a  more  complete  picture    

Phil  Meyer,  Detroit  Riots,  1967  
“A  reporter,  talking  to  people  on  the  street   corner,  draws  comparisons  intui%vely,  almost   unconsciously.  When  dealing  with  large   numbers  of  people—437  were  interviewed  in   the  Detroit  survey—intui%on  is  not  enough.  It   takes  a  computer  to  count  and  sort  and  analyze   the  thoughts  of  that  many  people,  and  the  input   must  be  consistently  structured.”  

Phil  Meyer,  Detroit  Riots,  1967  
    “Educa%on  and  income  were  not  good   predictors  of  whether  a  person  would  riot.”  

Week  1:  Basics  
  What  is  computa%onal  journalism?     Data  in  journalism     Aims  of  the  course     Course  structure      

Design  
“[Designers]  are  guided  by  the  ambi%on  to   imagine  a  desirable  state  of  the  world,  playing   through  alterna%ve  ways  in  which  it  might  be   accomplished,  carefully  tracing  the   consequences  of  contemplated  ac%ons.”    
   -­‐  Horst  RiRel,  The  Reasoning  of  Designers  

Design  is  not  objec%ve  
“During  the  industrial  age,  the  idea  of  planning,  in   common  with  the  idea  of  professionalism,  was   dominated  by  the  pervasive  idea  of  efficiency.   We  have  come  to  think  about  the  planning  task  in   very  different  ways  in  recent  years.  We  have  been   learning  to  ask  whether  what  we  are  doing  is  the   right  thing  to  do.     That  is  to  say,  we  have  been  learning  to  ask   ques%ons  about  the  outputs  of  ac%ons  and  to  pose   problem  statements  in  valua%ve  frameworks.        
     -­‐  Horst  RiRel,  Dilemmas  in  a  General  Theory  of  Planning  

Design  is  poli%cal  
“No  plan  has  ever  been  beneficial  to  everybody.   Therefore,  many  persons  with  varying,  oten   contradictory  interests  and  ideas  are  or  want  to  be   involved  in  plan-­‐making.  The  resul%ng  plans  are   usually  compromises  resul%ng  from  nego%a%on   and  the  applica%on  of  power.  The  designer  is  party   in  these  processes;  he  takes  sides.”    
   -­‐  Horst  RiRel,  The  Reasoning  of  Designers  

Different  kinds  of  knowledge  
  Norma%ve:  “what  should  be”  
(poli%cal  philosophy,  sociology,  ethics,  cri%cal  theory…)  

  Instrumental:  “how  to  get  there”  
(in  our  case:  journalism  and  computer  science)  

  This  course  is  about  both.    

Week  1:  Basics  
  What  is  computa%onal  journalism?     Data  in  journalism     Aims  of  the  course     Course  structure      

Theory  
We  will  learn  important  guiding  principles  about     •  Filter  design   •  Visualiza%on   •  Social  network  analysis   •  Drawing  conclusions  from  data   •  Security  modeling  

Techniques  
We  will  discuss  a  handful  of  techniques  in  great  depth.     •  Distance  func%ons  and  clustering   •  Vector  space  document  model   •  Recommender  systems   •  Proposi%on  extrac%on     •  Knowledge  representa%on  as  linked  data   •  Community  detec%on       Any  requests?    

Course  structure  
•  Classes:  we’ll  review  the  readings  (so  please   read  them)   •  By  next  week:  form  groups  of  2-­‐3.     •  Assignments  every  other  week,  due  in  two   weeks   •  Some  involve  will  involve  coding,  all  will   involve  cri%cal  analysis.        

Your  data  
•  You  are  encouraged  to  pick  a  data  set  and   s%ck  with  it.   •  If  you  want,  can  do  all  assignments,  final   research  report,  etc.  with  this  data   •  This  is  a  research  course…  let’s  learn   something  new.        

What  data?  
SEC  reports,  municipal  open  gov  data,  Wikileaks,   your  favorite  archive,  social  media…       Two  criteria:     Journalis%cally  interes%ng   Requires  advanced  techniques      

Final  Report  
For  3-­‐point  students     •  A  theore%cal  discussion  (10  pages)     For  6-­‐point  students,  one  of:   •  A  theore%cal  discussion  (25  pages)   •  An  implementa%on  of  a  technique  and   discussion  of  results   •  Analysis  of  your  chosen  data   •  A  completed  story,  plus  methodology  

Sign up to vote on this title
UsefulNot useful