You are on page 1of 6

countculture

Open data and all that


How to help build the UKs open planning database: writing scrapers
with 21 comments
This post is by Andrew Speakman, whos coordinating OpenlyLocals planning application work.
As Chris wrote in his last post announcing OpenlyLocals progress in building an open database of planning applications,
while we can do the importing from the main planning systems, if were really going to cover the whole country, were
going to need the communitys help. Im going to be coordinating this effort and so I thought it would be useful to explain
how were going to do this (you can contact me at planning@openlylocal.com).
First, were going to use the excellent ScraperWiki as the main platform for writing external scrapers. It supports Python,
Ruby and PHP, and has worked well for similar schemes. It also means the scraper is openly available and we can see it in
action. We will then use the Scraperwiki API to upload the data regularly into OpenlyLocal.
Second, were going to break the job into manageable chunks by focus on target groups of councils, and just to sweeten
things as if building a national open database of planning applications wasnt enough were going to offer small
bounties (75) for successful scrapers for these councils.
We have some particular requirements designed to make the system maintainable, and do things the right way, but not
many are xed in stone, so feel free to respond with suggestions if you want to do it in a different way.
For example, the scraper should keep itself current (running on a daily basis), but also behave nicely (not putting an
excessive load on Scraperwiki or the target website by trying to get too much data in one go). In addition we propose that
the scrapers should operate by updating current applications on a daily basis and also make inroads into the backlog by
gathering a batch of previous applications.
We have set up three example scrapers that operate in the way we expect: Brent (Ruby), Nuneaton and Bedworth (Python)
and East Sussex (Python). These scrapers perform 4 operations, as follows:
1. Create new database records for any new applications that have appeared on the site since the last run and store the
identiers (uid and url).
2. Create new database records of a batch of missing older applications and store the identiers (uid and url). Currently the
scrapers are set up to work backwards from the earliest stored application towards a target date in the past
3. Update the most current applications by collecting and saving the full application details. At the moment the scrapers
update the details of all applications from the past 60 days.
4. Update the full application details of a batch of older applications where the uid and url has been collected (as above)
but the application details are missing. At the moment the scrapers work backwards from the earliest empty
application towards a target date in the past
The data elds to be gathered for each planning application are dened in this shared Google spreadsheet. Not all the elds
will be available on every site, but we want all those that are there.
Note the following:
The minimal valid set of elds for an application is: uid, description, address, start_date and date_scraped
The uid is the database primary key eld
All dates (except date_scraped) should be stored in ISO8601 format
The start_date eld is set to the earliest of the date_received or date_validated elds, depending on which is
available
The date_scraped eld is a date/time (RFC3339) set to the current time when the full application details are updated. It
should be indexed.
So how do you get started? Heres a list of 10 non-standard authorities that you can choose from. Aberdeen, Aberdeenshire,
Asheld, Bath, Calderdale, Carmarthenshire, Consett, Crawley, Elmbridge, Flintshire. Have a look at the sites and then let
me know if you want to reserve one and how long you think it will take to write your scraper.
Happy scraping.
Written by countculture
March 29, 2012 at 8:48 am
Posted in hyperlocal, local government, open data, openlylocal, planning
Tagged with Councils, gov2.0, hyperlocal, local data, local government, opendata, Planning Applications, PlanningAlerts
21 Responses
Subscribe to comments with RSS.
Ill have a look at converting my existing scraper for the (distinctly non-standard and annoying) site of Broxbourne
Council to this standard. The current scraper is here:
https://scraperwiki.com/scrapers/broxbourne_planning_applications/
Tom Hughes
March 29, 2012 at 9:10 am
Reply
Tom
I would be very interested to see how you get on as there are about 8 other sites like this (built by Civica?) which I
have already had a go at and found equally annoyingly and impenetrable. I did manage to crack two of them
(Harrow and Wrexham) so let me know when you are ready and we can swap notes.
Andrew
Andrew
March 29, 2012 at 7:46 pm
Reply
Think Ive got it all running now, and repopulated with all the data from the start of 2007 using the new schema.
The biggest pain of course is the lack of any direct URLs for applications
Tom Hughes
March 30, 2012 at 4:12 pm
Hi All
The missing link to the spreadsheet in the main article is:
https://docs.google.com/spreadsheet/ccc?key=0AhOqra7su40fdGdVbDRWYkxGbnhsTkFMTjBBSi1oTHc
Look for the Scraper eld names tab at the bottom
Andrew
March 29, 2012 at 7:50 pm
Reply
Quick question if a Council were to release the data in a machine readable format, as a starter what would the
minimum data set you would require?
adamjennison111
May 8, 2012 at 2:55 pm
Reply
The mandatory data elds are tagged as such in the Scraper eld names section of the shared spreadsheet.
However the list is fairly minimal and wed hope for more detail if possible.
Andrew
May 11, 2012 at 7:52 am
Reply
Thanks Andrew, it was after posting that I RTFM and spotted the spreadsheet..!
I ask because I work in Hull City Council and am looking at Open Data.
I nd it incredulous that the planning systems that are open, such as ours and our neighbors the East Riding of
Yorkshire, are so poor when it comes to providing data that is machine readable. Not even an RSS feed!??
I am looking to wrap some our systems in APIs that can provide data in better format, and in this case I would
suggest even a little, with a link to the systems content, will be better than nothing.
I started to write a scraper for the IDOx system (open scraper wiki) as I want to create a hyperlocal blog but found
the Openlocaly site and paused to see what was happening.
I would rather that the data, which is already in the public sphere, be open for all not just eyes!
I am looking to use persistent URIs and a datastore to hold the information, but we are in the early stages of
planning our OpenData thoughts.. so any comments are greedily accepted, it would be good to be led by
developer needs rather than some internal direction..
adamjennison111
May 11, 2012 at 10:15 am
I really want Canterbury to be supported, and might have a bit of time to look at writing a scraper. According to your
spreadsheet this is AcolNet style.
So is anyone else working on this, and if not, would it be useful to try and get a single scraper working against
multiple AcolNet sites?
James Berry
May 8, 2012 at 3:59 pm
Reply
There are more than 30 AcolNet systems in the UK so we think its going to be more efcient to implement this as
part of the main OpenlyLocal systems. But we know people are impatient for this to happen quickly, so please nag
(on the new OpenlyLocal blog at http://blog.openlylocal.com/2012/03/29/how-to-help-build-the-uks-open-
planning-database-writing-scrapers/) if it doesnt happen soon,
Andrew
May 11, 2012 at 8:04 am
Reply
Ill leave it to you guys to do it internally then any progress?
James Berry
May 29, 2012 at 9:36 am
Sorry James we are still doing preparatory work on the AcolNet implementation
In the meanwhile if you want basic information on the most recent Canterbury applications then they are
available via this scraperwiki interface:
https://views.scraperwiki.com/run/planning_map/?db=South+East&auth=Canterbury&max=200
Similarly Stoke on Trent applications are here:
https://views.scraperwiki.com/run/planning_map/?db=West+Midlands&auth=Stoke+on+Trent&max=200
Andrew
May 31, 2012 at 6:39 am
Sorry for the delay but at last Canterbury planning applications are now active on Openly Local
Heres the link:
http://openlylocal.com/councils/48/planning_applications
Andrew
August 10, 2012 at 9:39 am
Spectacular!
James Berry
August 10, 2012 at 11:17 am
and is it worth using the planningalerts googlecode repository as a starting point?
http://code.google.com/p/planningalerts/source/browse/#svn%2Ftrunk%2Fpython_scrapers%253Fstate%253Dclosed
James Berry
May 8, 2012 at 4:11 pm
Reply
I would really like to help out with Stoke-on-Trent City Councils town planning alerts, so at least theyd be useable but
everything is running off of sessions. Its so badly done, its a joke.
http://www.planning.stoke.gov.uk/dataonlineplanning/
Matt - My Tunstall (@MyTunstall)
May 10, 2012 at 1:40 pm
Reply
Stoke on Trent is also an AcolNet system so its one of more than 30 similar UK systems we have to crack. and the
comments above about Canterbury apply here too.
Andrew
May 11, 2012 at 8:11 am
Reply
Hi Stoke on Trent planning applications are (at last) being scraped onto Openly Local
Heres the link
http://openlylocal.com/councils/96/planning_applications
Andrew
August 10, 2012 at 5:21 pm
Reply
[...] reading the article they, Openlylocal, might just have solved it. They are also asking for help, writing Scrapers, rather
than read here what its all about, go to the site and read for your self, but above [...]
Architectural Technologist OpenlyLocal
May 11, 2012 at 7:45 am
Reply
Are there any plans to include, for example, the New Forest National Park Authority? The New Forest site now being
scraped is the district councils.
Chunter
September 17, 2012 at 4:26 am
Reply
Not yet. We havent ruled it out, but want to get the councils tackled rst.
countculture
September 17, 2012 at 3:01 pm
Reply
In the meantime you can get access to New Forest District and National Park applications via my map interface
here https://views.scraperwiki.com/run/planning_map/?db=South+East.
Andrew
September 17, 2012 at 6:43 pm
Blog at WordPress.com. The Journalist v1.9 Theme.
Follow
Follow countculture
Powered by WordPress.com

You might also like