Professional Documents
Culture Documents
KUEK SER
1
When setting up your data team, it is
absolutely ok to start small. The diversity of
skills matters more than team size. However,
avoid having a single data journalist and
expecting that person to do everything.
5
Always question the quality of your data. Apply
the same journalism ethics and standards -
such as accuracy and accountability - to data-
driven stories.
2
All data team members, regardless of their
discipline, are part of the news desk. They
should be treated as journalists and sit within
the desk to encourage communication and
collaboration.
6
Before doing anything to the dataset, keep a
copy of the original in case you make mistakes
and lose access to the data source.
3
The data team should not be seen as a “service
desk” called in to help solve a technical issue
or add a feature midway through a project.
They should be involved in the project as
early as possible.
7
When dealing with a data sample, scrutinize
its size, sampling method, margin of error
and confidence level to avoid misleading your
audience.
4
When providing data journalism training for
your newsroom, include managers and editors
at least in the introduction or overview part.
You want them to empower the journalists
and be able to edit data stories.
8
Be extra careful in making cause and effect
conclusions among data points. Correlation
between two variables does not mean
causation.
10
Perform user-testing on your design, and
redesign it based on user feedback. User-
testing is not necessarily expensive or
resource-intensive.
13
Don’t hide your data. Open your data to the
public, including your competitors. Exposing
data to experts and other users allows more
findings and insights to be extracted.
11
Avoid using pie charts to present your data
as they are often less effective and widely
misused.
If you plan to set up a data journalism team, or want to make your existing data journalism desk more efficient,
Sections 1-4 are most relevant to you. Section 7, on monetization and data journalism business models, and
parts of Section 8, Building Data Journalism in Newsroom, should also be of interest.
If you’re an aspiring data journalist, or you’re already telling stories with data but looking to do this better, focus on
Sections 1-4. Also, Sections 5-6 explain the techniques and tools for data journalism. For further reading, practice
and external support, check the resources in Section 8.
Even if you don’t do data analysis or create visualization, your role in guiding and bulletproofing data-driven stories
is vital. Sections 2 and 4 will help you guide your data journalists. Parts of Section 5 - Data Integrity and Bulletproof
Your Data - will help you ask the right questions when editing stories. For editing graphics, please refer to the good
data visualization elements under Visualization in Section 6. There are also several resources specifically selected
for editors and an online course Bulletproof Data Journalism listed in Section 8.
Pajhwok Afghan News is In an award-winning series, the The use of stunning images and
Afghanistan’s largest independent Oxpeckers Center for Investigative an engaging video make this
news agency and the country’s Environmental Journalism, Africa’s story stand out, and, while data is
sole media outlet to use data to first journalistic investigation unit not the main focus, its inclusion
inform the public. It uses simple focusing on environmental issues, and the use of visualization help
data visualization with strong used data heavily in its reporting. to highlight the gravity of the
narrative to appeal to an audience The unique data it collected erratic rainfall. Nepali Times, an
with low data and digital literacy, revealed, for example, that since English weekly newspaper based
and is often cited by advocates as at least 2011 no large coal mines in Kathmandu, has no dedicated
an example of how data journalism operating in South Africa have data team. Its small editorial
can be done even under extremely been rehabilitated, and are simply team, including a self-taught data/
difficult conditions. abandoned. The center has two online producer, works together
full-time staff and works with more to produce interactive and data
than a dozen associate journalists. stories.
“Data can be the source of data journalism, or it can be the tool with
which the story is told - or it can be both,” says data journalism guru
Paul Bradshaw in Data Journalism Handbook.
Data journalism came to prominence in the Media organizations and journalism schools
2000s due to several factors: have invested in data journalism because it
has been proven to:
• The amount of digital data generated and made
public rose exponentially. • Find stories that would not have been found
• The increased use of the Internet by through traditional reporting.
governments and the public. • Find insightful or important stories hidden in
• The global open data movement that contributed data.
to the increased publication of government data. • Verify or clarify claims more authoritatively with
• New computational tools lowered the barrier for evidence
journalists to use data in their work. • Tackle bigger stories that involved a huge amount
of information or data.
• Communicate information quickly, effectively and
memorably.
To be clear, data journalism does not replace • Set your reporting apart from your competitors.
traditional journalism, but rather complements • Engage the audience in more innovative and
and enhances what journalists have been doing personalized storytelling approaches.
for centuries. • Increase audience engagement through social
media-friendly visualization.
DESIGN Adobe Illustrator, mapping softwares like QGIS, data visualization tools like
Highcharts, Tableau and Carto, infographics tools like Infogr.am.
For example, a 2-member team may consist Alternatively, you can build the team by training
of a reporter who can code and a designer; a existing staff. Look for reporters who use
programmer-cum-designer and a reporter; or spreadsheets in their work and present numbers
other combinations. There are also data teams effectively in their stories. A background or
with limited coding capacity that work with an interest in mathematics is helpful. The team’s
organization’s IT department. programmer or designer should have a passion to
deploy their skills for journalism and be prepared
If you plan to hire externally for the team - to be involved in reporting.
whether for a reporter, programmer or designer
Beside producing content, a data team can University academics and think-tank researchers
be a lead innovator and data evangelist in the are good partners for data projects as they have
newsroom, so it’s crucial to ensure it is growing in both data and expertise, and many are keen to
both intelligent and technical capacity. work with the media to increase public awareness
on a subject and boost publicity for their work.
If you operate in regions where there’s a vibrant
data journalism community - such as the United Civic hackers - programmers, designers or data
States, Europe or Latin America - there are plentiful scientists who deploy their professional skills to
external resources. Conferences such as those solve civic problems - are also potential partners
organized by the National Institute for Computer- (and a good source for future hiring). You can
Assisted Reporting (NICAR), workshops by outside find them in non-profit technology advocacy
contractors or local journalism organizations like groups like Code for Africa or media-tech groups
the Investigative Reporters and Editors (IRE) in like Hacks/Hackers, a global movement with over
the US, mentorship by other data journalists and 100 local chapters in most major cities that aims
partnership with specialist journalism groups like to marry technology with journalism. Sending
the Center for Public Integrity are great ways to your team members to participate in journalism-
upgrade your team’s firepower. See Resources themed hackathons can be an effective way to
section for more such resources, including expand their knowledge and grow their network.
fellowships and networks. The Global Editors Network’s (GEN) Editors Lab
program is a worldwide series of hackathons held
In countries where there is little data journalism, for newsrooms.
peer learning, mentorship and external training
might be less easy to come by. This is where
Newsroom
together. Proximity works.
2. Use an instant communication tool like Slack for
the whole newsroom.
There are certain underlying factors behind 3. Have 15-minute daily stand-up meetings, where
successful data journalism. those in the newsroom share briefly what they
did yesterday and what they plan to do today.
A research report from the Storybench project at This keeps everyone on the same page and finds
Northeastern University’s School of Journalism
areas for collaboration.
published in May 2017 interviewed 72 digital
journalists, including data journalists, in leading 4. Break down perceptions: i.e. developers and
digital newsrooms around the world, and found designers should not “serve” journalists. They are
three themes of best practice behind their an integral part in finding and presenting stories
successes: team-based collaboration, an open and designing how users interact with them.
source ethos within and between newsrooms, 5. Have regular skill sharing or show-and-tell
and mobile-driven story presentation. sessions to foster appreciation for each other’s
work. Attendance should be voluntary.
The first theme is fundamental for the
sustainability of data journalism in a newsroom, 6. Encourage developers and designers to write,
while the second could contribute to the and journalists to code.
development of digital journalism in a region. 7. Involve everyone early in the project to build a
shared sense of ownership.
You can employ talented people with diverse 8. Start with projects on less time-sensitive issues
skillsets but more important, according to the to experience the collaborative process without
report, is to “construct collaborative environments
rushing to meet a deadline.
in which these players can work together as a
9. Use Google documents or a project management
team”. Traditional newsroom operations tend
to segregate news production into departments tool like Trello to track progress.
like design, photo and research, and organize the
workflow in which ideas and information always multifaceted teams to self-organize, coming
move in one direction. By contrast, the successful together organically to produce an editorial
digital newsrooms interviewed allow “nimble, product”.
One advantage of using data in reporting is Such tools and templates can shorten the
the ability to connect the subject matter with production time and free up developers to
individual users. Data personalization is highly focus on more important things. Avoid investing
engaging because users can access information resources in developing something that is rarely
directly related to them. used.
Examples are news apps that allow users to view Data visualization components often used in data-
crime records in their neighborhood or websites driven stories include timelines, interactive charts
that compare users’ income with their parents’ and maps, sortable and searchable data tables,
generation. and grid image galleries. Data teams should have
access to tools that can create these components
Don’t forget to look for data stories by other media easily and quickly.
outlets on the same topic. Know the datasets
NPR
PROPUBLICA
WNYC
If you can frame your question, hypothesis or Journalists can file freedom of information (FOI)
reason in more specific terms, this helps make requests to get government data in countries
your reporting more efficient. Instead of “Has with FOI laws. In countries where government
crime risen or fallen in Gotham City?” use “Has information is closely guarded, accessing data
the intentional homicide rate risen or fallen in can be challenging, but not impossible. There
Gotham City despite the city tripling its crime- are many success stories in South America: LA
fighting spending in the last 3 years?” Nacion in Argentina, OjoPúblico and Convoca in
Peru, and Poderopedia in Chile (now expanded
to Venezuela and Columbia) are some media
organizations that produce great data journalism
without an open data environment.
DocumentCloud
Data Integrity
If you are reporting on or publishing primary FOR NON-CODERS:
source documents, this is a good tool to search,
analyze, annotate and highlight data in your DataMiner
documents. It’s free for journalism organizations,
but you have to apply for an account. Import.io
OutWit
Converting PDF
to spreadsheet FOR CODERS:
Content in a PDF document is not structured data,
and it can be a challenge to convert the content Python with the Beautiful Soup library
into structured data like a spreadsheet so it can
be analyzed and processed.
RESOURCE:
Tabula - free and open source tool designed for
journalists. Scraping for Journalists by Paul Bradshaw
Cometdocs - free until a limit
However, these only work for PDF documents with »» More tips on PDF conversion
digital material e.g., text created with software.
Your data might not be able to answer all these The following questions could help you
questions, but that doesn’t mean you can’t use evaluate the data.
it. The decision to use or not is an editorial one,
just like whether you want to use the information 1. Is the data accurate? Does it reflect the real-
provided by your sources. The circumstances world objects?
vary, but what every data journalist can and 2. Do you have enough data? Is all necessary data
should do is to be open and honest to their users.
present?
Let them know the weaknesses and limitations of
your data. Explain your decision to use that data 3. Did you get the data for the relevant time period?
if necessary. What is the most recent data? If newer data is
not available, why not?
4. Are all data values clearly defined by the data
provider? Does it come with a data dictionary/
FURTHER READING ON THIS TOPIC: codebook?
5. Is there another source that collects similar or
Evaluating data quality, Columbia Journalism parallel data? Which is more reliable? Why are
School they different?
6. Who collects, processes, publishes and maintains
the data? They could be different parties.
7. Why, when and how was the data collected? The
Data Biography motive might lead to biased data.
8. Did the collection method change over time? Are
A good practice is to build a data biography -
data points at different times comparable?
a document to record all the answers to the
questions above. Data science and information 9. Does the data come with disclaimers, limitations
design expert Heather Krause has shared a data or assumptions?
biography template online. 10. Who has used the data before? What were the
analyses made using the data?
11. If you want to compare between data points, are
they apples-to-apples comparisons? For example,
the definition of underage child varies in different
countries.
• Duplicates
• Transform the data structure in a spreadsheet to
a proper layout e.g., only one row for header, one
• Totals differ from aggregates
data point in each row.
• Names are in different orders - e.g. Chinese put
surname (family name) before first name
TUTORIAL
Visualise
Transform
FINDING
dataset
INSIGHTS
IN DATA Analyse,
Interpret
Document Insights
Source Source
Effective data visualizations - where information Knowing how they will view or interact with your
has been thoughtfully selected and presented visualization (social media, mobile/desktop, print
in a clear, simple and straightforward manner or TV broadcast) or whether they understand
100%
Usability Problems Found
75%
50%
25%
0%
0 3 6 9 12 15
Credit: Nielsen Norman Group Number of Test Users
4
In some cases a table works better than
visualization:
Your headline should tell the story: Tell your
• when there are only a few data points audience the key message. For example, for
• when it’s unnecessary to show comparison, trend a bar chart showing Olympic medal counts
or relationship by country, instead of “Top 10 countries by
• when it’s important to show precise value total medal count, Sochi Winter Olympics,
2014”, use a more specific headline like
“US ranks 2nd in medal count in the Sochi
Winter Olympics 2014”.
After creating your chart, there are other
components you need to decide, including the
Top ten countries by total medal count
headline, label, legend, typography and color. Sochi Winter OLympics, 2014
Russia
The following part covers some general guidelines.
United States 28
A good book on this topic is Designing Data Norway
Visualizations by Noah Iliinsky and Julie Steele. Canada
Netherland
The following part covers some general guidelines. Germany
Sweden
5
Total Medals Won
6
golden rules for data visualization design. We
have added one more below.
1
Make your message stand out from the
No 3D effects: These distort the data as the rest: Use color tone, labeling or other visual
part closer to the viewer appears larger. Also effects to draw attention to the element that
avoid color gradients and drop shadows. carries your message. Look at the medal
2
These are unnecessary visual noise. count chart above, the bar representing US
has a stronger yellow and label because the
Sort data from largest to smallest: This story is about the US team’s performance at
makes visual comparison much easier. the Games.
*Tableau is the only listed tool that allows a non-coder to build multiple charts on a dashboard that can interact with each other. This is
good when you don’t know coding but want to build a news app for users to explore complicated datasets.
Infogram (paid)
Google My Maps
Venngage (paid)
Google Earth Pro
TUTORIAL:
Data and Tools In an interview with the Tow Center for Digital
Journalism, La Nacion multimedia and interactive
development manager Momi Peralta made a
Sharing your data and tools can go against the
good case for sharing data especially in countries
culture of most news publication markets where
where access to information is difficult.
outlets compete for exclusive stories, audience
attention and faster news delivery. However,
more journalists have found that sharing data, “This is not only a data revolution. It is an open
tools, technical know-how and newsroom innovation revolution around knowledge. Media
workflow across news organizations has created a must help open data, especially in countries with
synergy that elevates the industry’s capabilities as difficult access to information.
a whole. The wheel doesn’t have to be reinvented
by everyone every time. “The way to go for us now is to use data for
journalism but then open that data. We are
For example, one media outlet produced a story building blocks of knowledge and, at the same
using government census data. It converted the time, putting this data closer to the people, the
data, originally in PDF format, into a spreadsheet, experts and the ones who can do better work
cleaned it and shared it online. Other outlets than ourselves to extract another story or detect
can skip those basic steps and develop more spots of corruption.
complicated tools to analyze the cleaned data for
different stories. When the second outlet shared “It makes lots of sense for us to make the effort
its tools, other outlets can modify them in turn of typing, building datasets, cleaning, converting
for their own use. This creates an ecosystem that and sharing data in open formats, even organizing
Want to start a small data journalism team in your newsroom? Here are 8 steps
Big data in small organizations: Constantaras. Strategies from setting up data teams in developing
countries
Collaborative, Open, Mobile: A Thematic Exploration of Best Practices at the Forefront of Digital
Journalism
Diving into Data Journalism: Strategies for getting started or going deeper
NICAR listserv
ICFJ
School of Data
Mistakes We Made So You Don’t Have To: Data Visualisation, Journalism and the Web
FiveThirtyEight
Katadata
IndiaSpend
ProPublica
Tempo - Investigations
Caixin - DataNews
OTHERS
MDIF Prague
Salvatorska 10
110 00 Prague 1, Czech Republic
(420) 224-312-832
www.mdif.org