You are on page 1of 32

Implementing the Data Access

Protocol in Python
Dr. Rob De Almeida
Table of Contents

History

Current implementation

Client

Server

Plugins & responses

WSGI & Paste

Future
History

pyDAP is a free implementation of the
Data Access Protocol written in Python
from scratch

It is the product of naïveness and
determination :)
Why Python?

Object-oriented high level
programming language that
emphasizes programmer effort (vs.
computer effort)

Increasing usage in science (CDAT,
MayaVi) and web (Google, YouTube)

Advantages: interpreter, batteries
included, easy prototyping, dynamically
typed, concise, fun
pyDAP 1.0

Started in 2003

“Afternoon project”: client only,
downloaded data from ASCII response
and worked only with Grids and Arrays

Reverse-engineering of the protocol

Should've really been version 0.0.1
pyDAP 1.x

Binary data using Python's xdrlib

Server architecture based on a
common core that could run as CGI,
Twisted or using Python's
BaseHTTPServer
pyDAP 2.0

Complete rewrite, based on the DAP
2.0 specification draft

Developed during the Google Summer
of Code 2005

Own implementation of XDR

Server built based on WSGI
specification*

This should've been version 1.0
pyDAP 2.1

Fully buffered server, able to handle
infinite datasets

Automatic discovery of plugins

Automatic installation of dependencies

Runs with Python Paste*
pyDAP 2.2.5.8

Released last Friday (2007-02-16)

Approximately 3k LOC for client and
server, including docstrings, comments
and its own XDR implementation

Support for additional plugins (for new
data formats) and responses (for new
output) that are auto-discoverable

Stub support for DDX on the client and
server
Client

Based on the httplib2 module

HTTP / HTTPS

Keep Alive

Auth: digest, basic, WSSE, HMAC digest

Caching

Compression: deflate, gzip

Intuitive interface
Sample client session
>>> from pynetcdf import NetCDFFile >>> from dap.client import open

>>> dataset = NetCDFFile(“coads.nc”) >>> dataset = \

... open(“http://server/coads.nc”)

>>> sst = dataset.variables['SST'] >>> sst = dataset['SST']

>>> print sst.shape >>> print sst.shape

(12, 90, 180) (12, 90, 180)

>>> print sst.dimensions >>> print sst.dimensions

('TIME', 'COADSY', 'COADSX') ('TIME', 'COADSY', 'COADSX')

>>> print sst[0,40,40] >>> print sst[0,40,40]

28.0669994354 [[[ 28.06699944]]]
Client usage

Commonly used to automate the
download of data from OpeNDAP
servers and storing in a different
format (scripting)

Dapper-compliance validator for testing
servers
Server

“Writing a server is like writing a client
backwards”

Thin layer between plugins and
responses (both auto-discoverable)

Implemented as a WSGI application*

Deployed using Paste Deploy*
Plugins and responses
Plugins and responses

http://localhost:8080/file.nc.das
Installing plugins & responses

pyDAP uses EasyInstall:

easy_install dap.plugins.netcdf

easy_install dap.responses.html

Easy to create new plugins (for small
values of “easy”):

paster create -t dap_plugin myplugin

Generates template with skeleton code

New plugin can be easily distributed
Available plugins

CSV

netCDF (reference implementation)

SQL (compatible with most databases
but generates “flat” dataset)

Matlab 4/5

GrADS grib

HDF5 and GDAL (experimental)

grib2? (Rob Cermak)
Available responses

dds, das, dods

ASCII variant

HTML form

JSON

WMS / KML

EditGrid / Google Spreadsheets

netCDF?
JSON

Lightweight alternative to XML for data
exchange

Based on a subset of Javascript

Easy to parse on the browser

Parsers and generators for C, C++ C#,
Java, Lisp, Lua, Objective C, Perl, PHP,
Python, Ruby, Squeak and several other
languages

Coincidentally, also a subset of Python

JSON == valid Python code
A JSON response
Content-description: dods_json
XDODS-Server: dods/2.0
Content-type: application/json

{"test": {"attributes": {"NC_GLOBAL": {},
"author": "Roberto De Almeida"},
"type": "Dataset",
"a": {"type": "Int32",
"shape": [10],
"data" : [0, 1, 2 , 3, 4, 5, 6, 7 , 8,
9] }}}
WMS

Returns maps (images) from requested
variables and regions

Works with geo-referenced grids and
sequences

Layers can be composed together

Data can be constrained:

/coads.nc.wms?SST // annual mean

/coads.nc.wms?SST[0] // january
WMS example request

http://localhost:8080/netcdf/coads.nc.wms?LAYERS=SST&WIDTH=512
KML

Generates XML file using the Keyhole
Markup Language, pointing to the WMS
response

Nice and simple interface for quick
visualizing data
WSGI

Python Web Standard Gateway
Interface

Simple and universal interface between
web servers (like Apache) and web
applications (like pyDAP)

Allows the sharing of middleware
between applications (gzip,
authentication, caching, etc.)
Before WSGI
After WSGI
Paste & Paste Deploy

Python module that facilitates the
development and deployment of web
applications

Allows the deployment of pyDAP using
a simple INI file that specifies server,
middleware and application
configuration
Running a server
[server:main]
use = egg:PasteScript#wsgiutils
host = 127.0.0.1
port = 8080

[filter-app:main]
use = egg:Paste#httpexceptions
next = pyDAP

[app:pyDAP]
use = egg:dap
name = Test DAP server
root = %(here)s/data
verbose = 0
template = %(here)s/template
x-wsgiorg.throw_errors = 1
dap.responses.kml.format = image/png
Future

pyDAP 2.3 almost ready

Dapper compliance

Faster XDR encoding/decoding

Initial support for DDX response and parser

Build a rich web interface (AJAX) based
on JSON + WMS + KML responses

Not only to pyDAP, but to other OPeNDAP
servers using pyDAP as a proxy
Acknowledgments

OPeNDAP for all the support

James Gallagher for all my questions
about the spec on the mailing list

Everybody who submitted bugs (bonus
points for submitting patches!)