You are on page 1of 63

Introduction to open data journalism: finding stories in data

The Open Data Institute, London, 23 April, 2013
The SKOR Codex (2012), La Société Anonyme, ODI Commission “Data as Culture”
Slides by Lisa Evans and Kathryn Corrick

Introductions
Lisa Evans Data Wrangler, School of Data, Open Knowledge Foundation Former data journalist, Guardian Kathryn Corrick Training Business Manager, ODI UK Chair, Online News Association

Slides by Lisa Evans and Kathryn Corrick

Introductions

Slides by Lisa Evans and Kathryn Corrick

Data journalism packs
Go to: http://tinyurl.com/odi-dj

Slides by Lisa Evans and Kathryn Corrick

Telling stories with data
The SKOR Codex (2012), La Société Anonyme, ODI Commission “Data as Culture”
Slides by Lisa Evans and Kathryn Corrick

http://understandinguncertainty.org
Slides by Lisa Evans and Kathryn Corrick

http://understandinguncertainty.org/files/animations/Nightingale11/Nightingale1.html
Slides by Lisa Evans and Kathryn Corrick

Where does my money go?

http://wheredoesmymoneygo.org/bubbletree-map.html#/~/total/health
Slides by Lisa Evans and Kathryn Corrick

Data digging

Slides by Lisa Evans and Kathryn Corrick

Big leaks

Source: http://www.icij.org/offshore/how-icijs-project-team-analyzed-offshore-files
Slides by Lisa Evans and Kathryn Corrick

Gunter Sach’s offshore network
http://www.icij.org/offshore/interactive-gunter-sachs-network
Slides by Lisa Evans and Kathryn Corrick

Data on maps can be enough
http://www.guardian.co.uk/news/datablog/interactive/2011/aug/09/uk-riots-incident-map
Slides by Lisa Evans and Kathryn Corrick

Riots and poverty mapped
http://www.guardian.co.uk/news/datablog/interactive/2011/aug/16/riots-poverty-map
Slides by Lisa Evans and Kathryn Corrick

http://www.guardian.co.uk/news/datablog/2012/feb/29/uk-hospital-heart-surgery-mortality-rate
Slides by Lisa Evans and Kathryn Corrick

Funnel plots
www.ncbi.nlm.nih.gov/pubmed/15568194
Slides by Lisa Evans and Kathryn Corrick

Exercise
Discuss in your groups what you have just seen.
Any surprises?

Any things to note?

Slides by Lisa Evans and Kathryn Corrick

What is data?

Photo: Lisa Evans
Slides by Lisa Evans and Kathryn Corrick

 Data is a record of some information e.g. written, digital
 Digital data is something you can keep on your computer

Defining data

 Raw data is exactly as it was collected from a source
 Structured data is organised so it's easier to use e.g. data in a spreadsheet  Big data is too big to be stored on one computer, instead you need parallel servers

 Personal data relates to an individual who can be identified from that information
Slides by Lisa Evans and Kathryn Corrick

Open Definition
“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” OpenDefinition.org

Slides by Lisa Evans and Kathryn Corrick

Break

Photo: Kathryn Corrick
Slides by Lisa Evans and Kathryn Corrick

Discussion: What makes a trusted (data) source?
Slides by Lisa Evans and Kathryn Corrick

Trusted data sources…
Show their methods
Are open to inquiries and timely in their replies

The team includes a statistician
Have a good track record

Slides by Lisa Evans and Kathryn Corrick

Trusted data sources…
Question everything

Slides by Lisa Evans and Kathryn Corrick

How to stay up to date with data releases
Office for National Statistics release calendar Parliamentary releases mailing list Planning alerts mailing list RSS feeds Press releases (see your packs for links)
Slides by Lisa Evans and Kathryn Corrick

UK law & licensing*

* What follows should not be taken or used as legal advice.
Photo © Jason Morrison: http://www.sxc.hu/photo/952313
Slides by Lisa Evans and Kathryn Corrick

Key laws affecting data journalism
Intellectual Property - copyright and database rights

Computer Misuse
Data Protection

Freedom of Information Act

Slides by Lisa Evans and Kathryn Corrick

What are intellectual property rights?
Rights which are given which allow ownership of creations  Patents  Trade marks  Design rights  Copyright  Database rights  Many creations are a bundle of rights – protected by more than one or all of the above
Slides by Lisa Evans and Kathryn Corrick

Copyright Designs & Patents Act 1988
Original works - e.g. content, graphics, text, music Gives exclusive rights to the author of the work allowing the author to control the copying and exploitation of it Arises automatically Fair dealing - criticism or review, reporting current events, non-commercial research, educational use Beware “public domain” assumption and myth

Slides by Lisa Evans and Kathryn Corrick

Database definition
“A collection of independent works, data or other materials which are arranged in a systematic or methodical way and are individually accessible by electronic or other means”
See: http://www.out-law.com/page-5698 The Copyright and Rights in Databases (Amendment) Regulations 2003 http://www.legislation.gov.uk/uksi/2003/2501/contents/made The Copyright and Rights in Databases Regulations 1997 http://www.legislation.gov.uk/uksi/1997/3032/contents/made
Slides by Lisa Evans and Kathryn Corrick

Databases
Copyright
 Creative effort and substantial investment in the selection and presentation  Individual components of the database

Database rights
 Substantial investment in obtaining, verifying and presenting the database

Slides by Lisa Evans and Kathryn Corrick

Rule of thumb
Do you have rights or permission to publish?

Do you have rights to use the information/data?
Is the data derived from other sources?
(see licensing)

Slides by Lisa Evans and Kathryn Corrick

Computer Misuse Act
Offences
Unauthorised access to computer material

Unauthorised access with intent to commit or facilitate further offences
Unauthorised modification of computer material

Penalties
2 – 10 years imprisonment

Fines
Slides by Lisa Evans and Kathryn Corrick

Rule of thumb
Leaks… get the legal team in

Slides by Lisa Evans and Kathryn Corrick

Data Protection
Personal Data
UK Data Protection Act 1998

Data relating to a living identifiable person must be processed fairly and lawfully
Processing that is not immediately apparent to users e.g. cookies (new laws and guidance) damages available to data subjects

Slides by Lisa Evans and Kathryn Corrick

Rule of thumb
Does this data contain personal identifiable data?
Could this data be used combined with another data set to create personal identifiable data? Anonymisation is hard
Further reading: ODI Friday lectures on these topics
http://www.scribd.com/doc/128356210/Business-considerations-for-privacy-and-open-datahow-not-to-get-caught-out http://www.scribd.com/doc/125638490/Getting-to-grips-withthe-National-Pupil-Database-personal-data-in-an-open-data-world

Slides by Lisa Evans and Kathryn Corrick

Licenses: what to look for
Licenses identify the scope and limited of how intellectual property can be used Commonly used in the UK:

 All rights reserved
 Royalty free license

 Paid-for license
 Open Government License

 Creative Commons License
Slides by Lisa Evans and Kathryn Corrick

Rule of thumb
If you are uncertain about what rights you may have over a piece of content, data or dataset or how you can use it…
Contact the owner. Ask.

Slides by Lisa Evans and Kathryn Corrick

Exercise:
See journalist pack for trusted sources exercise:

http://tinyurl.com/odi-dj

Slides by Lisa Evans and Kathryn Corrick

Freedom of information Act 2000
Provides public access to recorded information held by public authorities
The Act does not necessarily cover every organisation that receives public money Recorded information includes printed documents, computer files, letters, emails, photographs, and sound or video recordings
Slides by Lisa Evans and Kathryn Corrick

FOIA tips
Sign up to 'What Do They Know?’

https://www.whatdotheyknow.com/ Always check commercial confidentiality. See Information Commissioner Office advice:
http://www.ico.org.uk/~/media/documents/library/Environmental_info_reg/Practical_application/eir_con fidentiality_of_commercial_or_industrial_information.ashx

Slides by Lisa Evans and Kathryn Corrick

Finding the story: choosing your data
Exercise: http://tinyurl.com/dj-tax-exercise
Slides by Lisa Evans and Kathryn Corrick

Exercise (optional)
Create a decision/story tree for… Local council spending Or NHS reforms

Slides by Lisa Evans and Kathryn Corrick

Exercise
Find your data at: http://tinyurl.com/odi-dj
See: “The Data” “The Source”

1. What does your data tell you?
2. Add a new sheet with your names and email addresses
Slides by Lisa Evans and Kathryn Corrick

Exercise: data cleaning
Remove “people” or £ signs from your data
Check spelling and clarity

Remove the “dummy data” column
Bold headings and freeze 1st row and maybe 1st column

Slides by Lisa Evans and Kathryn Corrick

Exercise: using your data
Does it make sense to sum your data or is it already summed?
If it is summed move to bottom of sheet if not summed make them.

Slides by Lisa Evans and Kathryn Corrick

Exercise: using your data
Take two columns of your data and copy them to a new sheet
Go to the chart icon when on your new sheet Choose a suitable chart and give it a title and label the axes

Does the chart show what you wanted it to?
What difficulties did you encounter and how did you solve them?

Slides by Lisa Evans and Kathryn Corrick

Break

Photo: Kathryn Corrick
Slides by Lisa Evans and Kathryn Corrick

Crowdsourcing data

Slides by Lisa Evans and Kathryn Corrick

Haiti 2010

http://blog.ushahidi.com/2012/01/12/haiti-and-the-power-of-crowdsourcing/ http://www.guardian.co.uk/technology/2010/feb/04/mapping-open-source-victor-keegan http://wiki.openstreetmap.org/wiki/WikiProject_Haiti/Earthquake_map_resources Images : http://irevolution.files.wordpress.com/2010/01/ex41.png
Slides by Lisa Evans and Kathryn Corrick

Boston bombing 2012

http://www.bbc.co.uk/news/technology-22214511 http://www.reddit.com/r/findbostonbombers http://blog.reddit.com/2013/04/reflections-on-recent-bostoncrisis.html?m=1
Slides by Lisa Evans and Kathryn Corrick

Selection of tools
Ushahidi.com
Swiftriver.com

Crowdmap.com (closed beta at the moment)
Google Drive Forms and Spreadsheets

Twitter

Slides by Lisa Evans and Kathryn Corrick

Google Drive Demo

Slides by Lisa Evans and Kathryn Corrick

Ushahidi
Ushahidi was designed to easily crowdsource information using multiple channels, including SMS, email, Twitter and the web.

Ushahidi.com, http://vimeo.com/7838030
Slides by Lisa Evans and Kathryn Corrick

Crowdmap.com

https://womenundersiegesyria.crowdmap.com/
Slides by Lisa Evans and Kathryn Corrick

Exercise: Crowdsourcing data
Find the crowd sourcing exercise at:

http://tinyurl.com/odi-dj
Complete the form to see the results

Slides by Lisa Evans and Kathryn Corrick

Crowdsourcing tips

 State in one sentence what you want to achieve with crowdsourcing
 Have a clear procedure for verifying data

 Can you identify individuals from your data presentation?  What effect will this have?  People are more likely to join in if they feel in safe hands

Slides by Lisa Evans and Kathryn Corrick

Visualising your data

https://google-developers.appspot.com/chart/interactive/docs/gallery
Slides by Lisa Evans and Kathryn Corrick

Exercise: Google Charts
Find the Google Charts exercise at:

http://tinyurl.com/odi-dj

Slides by Lisa Evans and Kathryn Corrick

https://github.com/mbostock/d3/wiki/Gallery
Slides by Lisa Evans and Kathryn Corrick

https://github.com/mbostock/d3/wiki/Gallery
Slides by Lisa Evans and Kathryn Corrick

Time for questions

?
Slides by Lisa Evans and Kathryn Corrick

Thank you
Lisa Evans @objectgroup

Lisa.Evans@okfn.org Kathryn Corrick @kcorrick
Kathryn.Corrick@theodi.org

Slides by Lisa Evans and Kathryn Corrick

Links mentioned on the course
 http://OpenCorporates.com  http://Scribd.com  http://Slideshare.net  http://www.Prescribinganalytics.com  www.alltrials.net  RSS Readers https://docs.google.com/spreadsheet/ccc?key=0ApTo6f5Yj1iJd FRfWmhUVjV0WkktTjJhUUE4dGR5WUE#gid=0  More data http://data.worldbank.org/  http://Openstreetmap.org  http://Datawrapper.de
Slides by Lisa Evans and Kathryn Corrick