/  8
 
Introduction and Background
This is the third in a series of short papers with which I am trying to create the framework and justification for anew format which for now I am calling Data Rss. In this paper I am going to try to give a technical overview of howit might work, without delving into why I think this is a good idea. Please see the other two papers for that:Data RSS: A Modest Proposal(http://www.scribd.com/doc/12866121/Data-Rss)Data Rss: A Case Study (http://www.scribd.com/doc/13583957/DataRSS-Case-Study)
Roles
DataRSS is used between two parties, the Publisher, who ‘owns’ some data, and the Accessor, who wants to use that data. Publisher and Accessor are organizations with people in them. The Publisher wants to offer a technicalmeans to allow an application programsimple and standardized access to their data. The Accessor wants to write anapplication program that accesses anddoes something useful with data coming from any Publisher. Accessor andPublisher don’t know each other.Accessor’s Application A can as easily get data from Publisher P as from Publisher Q. Publisher P’s data can beaccessed as easily by Accessor A as by Accessor B.
Protocol and Format
Data RSS is a simple protocol and a simple data format. It can be implemented in any programming languageand more importantly, the Publisher and Accessor software need not know (can not know) what language thecounterparties software is written in.All DataRss requests return a response in one of several formats. For now those are: XML, JSON and HTML. Why HTML? This way requests from a normal browser can return some useful human readable information.
DataRss Endpoint
In essence DataRss is embodied by a url which we call the DataRss endpoint. A publisher makes their dataavailable to others by the simple and single act of implementing responses to this url. For example, hypothetically 
1
, the Sunlight Foundation could let the world know that their DataRss endpoint could be found athttp://services.sunlightfoundation.com/datarss.At minimum this would mean that clicking on that link would return a response that looks something like this:
2
---datarss:version: 0.1source:name: Sunlight Labsversion: 1---
Data RSS - Technical Overview
Pito Salas -rps@salas.com- April 9, 2009
1
All examples in this paper are hypothetical
2
All responses will be written out in more compact readable form. In reality the responses will be selectable as being in XML, JSON, YAML, or HTML
 
In what follows I will document key examples of of the format as it is evolving. This is organized along the lines of each of the top level URL components that are used to control it.
REST
The overall scheme of things is that I am trying to describe a unified set of REST URL patterns. Some of theroutes return information
about
the data sets (i.e.
discovery 
) and some of them return actual data.N.B. There are many ways to skin this cat - as is evidenced by the fact that each Publisher who designed a RESTAPI for their data approached it in a slightly different way. In a way that is the problem that I am trying to address.
Data RSS patterns
In what follows, I will use “.” (a single period) to denote the Data RSS endpoint. So when you see “,”, substitute,for example, http://www.followthemoney.org/datarss(another fictional endpoint.)
Request url: .
The base Data RSS Endpoint returns a basic “hello world” response to prove that there is, in fact, a Data RSSEndpoint here. It indicates the version of DataRSS and the name of the publisher, as well as whatever versionnumber they might set for their implementation.Example:
---datarss:version: 0.1source:name: Sunlight Labsversion: 1---
Request url: ./info
Request performance and feature information about this particular endpoint. An accessor might call this at thevery start to learn something about the particular implementation.Example:
Request: ./infoResponse:features:api-key-required: Yesformats: [JSON, XML]
Request url: ./datasets
Return a list of all the distinct data sets that this endpoint publishes. Each dataset corresponds more or less to a table or database or list of information. Datasets also may present various canned queries and default behaviors.Example:
REQUEST: ./datasets
Data RSS - Technical Overview
Pito Salas -rps@salas.com- April 9, 2009
 
RESPONSE:---name: newswirefullname: New York times Newswire API---name: campaignsfullname: New York Times Campaign Finance API---
Notes:The name of a dataset is used in subsequent requests as an identifier.
Request url: ./dataset/<name>/fields
Return the list of all the distinct fields of information that may appear in responses from this dataset.Example:
REQUEST: ./dataset/candidates/fieldsRESPONSE:---name: imsp_candidate_idfullname: the id number of the candidateurl-index: yes---name: candidate_namefullname: the name of the candidateurl-index: no---
Notes:The name of a field is used in subsequent requests as an identifier 
url-index: yes
means that this field can be used as an actual part of the URL, in exactly this way:
./dataset/candidates/imsp_candidate_id/9120
./dataset/<name>/queries
Return the list of all the standing queries that this dataset defines. A standing query is kind of a canned query which is meaningful to a particular space.Example:
REQUEST: ./dataset/candidates/queriesRESPONSE:---name: businessestype: url-parameterparameter: imsp_candidate_idfullname: This query will summarize contributions at the business level for aspecific candidate.
Notes:
Data RSS - Technical Overview
Pito Salas -rps@salas.com- April 9, 2009

Share & Embed

More from this user

Add a Comment

Characters: ...