/  3
 
Introduction and Background
I’ve been looking extensively at the great variety of data-oriented REST and REST-ish APIs thatare appearing, especially as part of various government transparency efforts. As an example, thereis the Sunlight Foundation’s API to look up information about congress people, or the Follow TheMoney API to look up information about lobbying and political contributions.I notice the following:There are many and they are appearing (and probably disappearing) constantly. More arebeing added.For a ‘consumer’ (that would be a programmer) of this information it’s pretty time consumingand error prone to study the documentation of each of these ‘similar but different’ APIs. Most arequite well documented but still each has to be discovered and studied separately.Creating applications (either browsers, or widgets, or middleware applications) that use andcombine information from more than one source is hard.This experience made me think of the history of RSS (“Really Simple Syndication”). This is aformat and protocol which is extremely widely used, and brings great benefit to people who don’tknow or care what it is or does. And I wondered whether we can learn something from thatexperience and bring it to bear on the world of data.This paper outlines an architecture which I am calling ‘Data RSS’ for convenience with thefollowing objectives:Allow a single access method to access a very broad range of data, numerical, textual and soon, but focused fundamentally on tabular information (broadly speaking.)Be easy and cheap to implement for the data/information owners/publishersSpecifically not require any centralization. Each data owner can independently decide whatdata to publish with Data RSS and when. New owners can appear and old ones can disappear with no coordination.It is not a goal that all the data must be able to be published with Data RSS. In other words,it’s expected and understood that meeting the preceding set of goals will in some cases makecertain kinds of information just not fit within the Data RSS scheme.
Brief History of RSS
(disclaimers: back in the day there was much intense debate about the accurate history of RSS, and even what the letters RSS stand for. I don’t intend to take a stand in this debate because it’s not at allrelevant to the point I am trying to make. What I describe below is one of the versions of the history.)
RSS arose in the early era of the web, when blogs were first being invented and when websites with news were starting to appear. The problem was how to create a general purpose ‘news
Data RSS - A modest proposal
Pito Salas -rps@salas.com- February 26 2009
 
reader’ which could display articles from blogs (which were appearing all over the place), and fromnews sites, listservs and newsgroups, and on and on. All these sources (‘publishers’) were deliveringinformation of roughly the same kind: articles with titles, with an author and a publication date. They weren’t all identical, for example some academic sources might have had a standard ‘citations’element, and so on.The insight was this: if we could get all the publishers to offer their articles with a standardformat, if it was cheap or free for them to do so, then we could have a ‘news reader’ that wouldallow a person to see all the new articles published on CNN side by side with the articlespublished on Pito’s Blog and on the Sunlight Foundations news group. And any new publisher on the scene had an easy way to get their content delivered to that same news reader.Data RSS is not a perfect analogy to this, but it serves as inspiration.
Data RSS
(another disclaimer: this isn’t anything like a specification. I will try to describe this to the levelof detail that I understand, hopefully sufficient to be understood. I will follow with a set of challenges, objections and questions.)In the description below I reference two ‘participants’:1.The
publisher
. This is who owns the data and wants to make it available broadly toothers. Technically the publisher can be found at a url, for example: opengovernment.org/datarss. So the publisher is also represented by a piece of software on the publisher’s web site that responds to that url.2.The
accessor.
This is who wants to use the data. It could be a web widget, it could be areporting tool, it could be another web site or application. Technically the accessor is a piece of software that is accessing a url, say for example opengovernment.org/datarss.Data RSS consists of the following major elements:An XML format that describes the information that is available. The names of datasets. for each dataset. names of fields, datatypes, and other discovery information.An XML format used to actually deliver dataA REST protocol with which to query for discovery information and for data.That’s all there is. At least at the heart. With these three formats publishers know what to do topublish their information as Data RSS and accessors have what they need to know to access any publisher’s information.
Advantages
An accessor has a shot at accessing information from any publisher, even those that don’texist yet, or accessing new datasets that become available from existing publishers.
Data RSS - A modest proposal
Pito Salas -rps@salas.com- February 26 2009

Share & Embed

More from this user

Add a Comment

Characters: ...