The Growing Importance of Data Journalism - O'Reilly Radar

Published by Luciana Moherdaui

Published by: Luciana Moherdaui on Dec 22, 2010
The growing importance of data journalism
Parsing the progress of open government data requires newtools and reliable information sources.
byAlex Howard|@digiphile|Comments: 2| 21 December 2010
One of thethemes from News Foothat continues to resonate with me is the importance of data journalism. That skillset has received renewed attention this winter after Tim Berners-Lee calledanalyzing data the future of journalism.When you look atdata journalism and the big picture, as USA Today's Anthony DeBarros did athis blog in November, it's clear the recent suite of technologies is part of a continuum of technologically enhanced storytelling that traces back tocomputer-assisted reporting(CAR).As DeBarros pointed out, the message of CAR "was about finding stories and using simple tools todo it: spreadsheets, databases, maps, stats," like Microsoft Access, Excel,SPSS, and SQL Server.That's just as true today, even if data journalists now have powerful new tools for scraping datafrom the web with tools likeScraperWikiandNeedlebase, scripting withPerl, or Ruby,Python, MySQLandDjango. Understanding the history of computer-assisted reporting is key to putting new tools in the proper context. "We use these tools to find and tell stories," DeBarros wrote. "We use them like we use atelephone. The story is still the thing."The data journalism session at News Foo took place on the same daycivic developerswereparticipating in a globalopen data hackathonand the New York Times hosted itsTimes Open Hack Day. Many developers at contests like these are interested in working withopen data, but the conversation at News Foo showed how much further government entities need to go to deliver on the promiseopen data holds for the future of journalism.The issues that came up are significant. Government data is often "dirty," with missing metadataor incorrect fields. Journalists have to validate andclean up datasetswith tools like Google Refine.ProPublica'sRecovery Tracker for stimulus data and projects is one of the best examples of thepractice in action.A recent gold standard for data journalism is the Pulitzer-Prize winningToxic Watersproject fromthe New York Times. The scale of that project makes it a difficult act to follow, though Timesdevelopers are working hard with nifty projects likeInside Congress.You can see a visualization of the Toxic Waters project and other examples of data journalism inthis Ignite presentation from News Foo.
Making open government data sing
At ProPublica, the data journalism team is conscious of deep linking into news applications, withthe perspective that the visualizations produced from such apps are themselves a form of narrative journalism. With great data visualizations, readers can find their own way and interrogatethe data themselves. Moreover, distinctions between a news "story" and a news "app" aredissolving as readers increasingly consume media on mobile devices and tablets.One approach to providing useful context is the "Ion" format atProPublica.org, where a project like"Eye on the Stimulus" is a hybrid between a blog and an application. On one side of the web page,
Insight, analysis, and research about emerging technologies
there's a news river. On the other, there's entry points into the data itself. The challenge to thisapproach is that a media outlet needs alignment between staff and story. A reporter has to befiling every day on a running story that's data sensitive.
Upgrading Data.gov
The data journalism News Foo session featured a virtual component, bringingCity Campfounder Kevin Curry,Data.govevangelistJeanne Holm, and Reynolds fellowDavid Herzogtogether with News Foo participants to talk about the value propositions for open government dataanddata  journalism.As the recentopen data reportshowed, developers are not finding the government data theyneed or want. If other entrepreneurs are to follow the lead of BrightScope, open governmentdatasets will need to be more relevant to business. The feedback for Data.gov and other government data repositories was clear: more data, better data, and cleaner data, please.Improving media access to data at the county- or state-level of government has structural barriersbecause of growing budget crises in statehouses around the United States. As Jeanne Holmobserved during the News Foo session, open government initiatives will likely be done in a zero-sum budget environment in 2011. Officials have to make them sustainable and affordable.There are some areas where the federal government can help. Holm said Data.gov has createdcloud hosting that can be shared with state, local or tribal governments.Data.gov is also rolling outa set of tools that will help with data conversion, optical character recognition, and, down the road,better tools for structured data.Those resources could make government data more readily available and accessible to themedia. Kevin Curry said that data catalogs are popping up everywhere. He pointed to CivicApps inPortland, Ore., where Max Ogden's work oncoding the middleware for open governmentled totranslating government data into more useful forms for developers.Data journalists also run into government's cultural challenges. It can be hard to find publicinformation officers willing or able to address substantive questions about data. Holm saidData.gov may post more contact information online and create discussions around each dataset.That kind of information is a good start for addressing data concerns at the federal level, butfostering useful connections between journalists and data will still require improvement and effort.
3 News Foo themes that continue to resonate
tags:data.gov,database,gov 2.0,government 2.0,newsfoo
