Accessing and serving scientific datasets with Python

Dr. Rob De Almeida

The Data Access Protocol

De facto standard for distributing science data on the internet, used by oceanography, meteorology and climate communities Simple HTTP-based protocol with XDR encoding for data transmission Supports complex dataset structures

Model output, satellite images, in-situ data, etc.

Protocol details

A dataset has different URLs describing it
● ● ● ●

http://server/dataset http://server/dataset.dds (structure) http://server/dataset.das (attributes) http://server/dataset.dods (data)

Client (usually) retrieves metadata from DDS/DAS responses and downloads data from DODS response as necessary

A simple example

Dataset with a list “a” of integers from 0 to 9

Let's also add a few attributes: author, history

What is the representation of metadata and data?

Dataset Descriptor Structure
Dataset { Int32 a[a = 10]; } test;

Dataset Attribute Structure
Attributes { a { String author "Rob De Almeida"; String history "Created for PyCon 2007"; } }

DODS response
Dataset { Int32 a[a = 10]; } test; Data: \x00\x00\x00\x0a\x00\x00\x00\x0a \x00\x00\x00\x00\x00\x00\x00\x01 \x00\x00\x00\x02\x00\x00\x00\x03 \x00\x00\x00\x04\x00\x00\x00\x05 \x00\x00\x00\x06\x00\x00\x00\x07 \x00\x00\x00\x08\x00\x00\x00\x09

Using pyDAP as a client

The client retrieves and parses the metadata (DAS/DDS), building a dataset object with all the variables than can be introspected Data is downloaded on the fly when required Uses httplib2 and a custom-made xdrlib based on numpy or array

Example usage
>>> fr om dap.client im port open >>> dataset = open('h ttp:/ /test .pyda p.org /coad s.nc' , verbose=True) http://test.pydap.org/coads.nc.dds http://test.pydap.org/coads.nc.das >>> pr int dataset.keys() ['U WND' , 'W SPD' , 'S ST' , 'V WND' , 'S LP' , 'A IRT' , 'S PEH' , 'C OADSX ', 'COA DSY' , 'T IME' ]

Introspecting the dataset
>>> time = dataset['T IME' ] >>> pr int time.type, time.shape, time.dimensions Float64 (12,) ('T IME' ,) >>> pr int time.units >>> pr int time.units hour since 0000-01-01 00:00:00

Retrieving data
>>> pr int time[:] http://test.pydap.org/coads.nc.dods?TIME[0:1:11] [ 366. 4018.425 7670.85 1096.485 1826.97 4748.91 5479.395 8401.335] 2557.455 6209.88 3287.94 6940.365

>>> pr int time[0] http://test.pydap.org/coads.nc.dods?TIME[0:1:0] [ 366.] >>> pr int time[-2:] http://test.pydap.org/coads.nc.dods?TIME[10:1:11] [ 7670.85 8401.335]

Working with sequential data
Dataset { Sequence { Int32 id; Float64 lat; Float64 lon; } test; } test%2Ecsv;

http://test.pydap.org/test.csv.dds

Retrieving data
>>> fr om dap.client im port open >>> dataset = open('h ttp:/ /test .pyda p.org /test .csv' , verbose=True) http://test.pydap.org/test.csv.dds http://test.pydap.org/test.csv.das >>> seq = dataset['t est' ] >>> pr int seq['l at' ][:] http://test.pydap.org/test.csv.dods?test.lat [10.1, 10.199999999999999, 10.300000000000001, 10.4, 10.5]

Iterating over sequential data
>>> fo r struct in seq: ... ... http://test.pydap.org/test.csv.dods?test.id http://test.pydap.org/test.csv.dods?test.lat http://test.pydap.org/test.csv.dods?test.lon 10.1 103.0 10.2 93.0 10.3 83.0 10.4 73.0 10.5 63.0 pr int struct['l at' ].data, struct['l on' ].data

Filtering sequences (sure way)
>>> fseq = seq.filter('% s<100 ' % seq.lon.id) >>> fo r struct in fseq: ... ... http://test.pydap.org/test.csv.dods?test.id&test.lon<100 http://test.pydap.org/test.csv.dods?test.lat&test.lon<100 http://test.pydap.org/test.csv.dods?test.lon&test.lon<100 10.2 93.0 10.3 83.0 10.4 73.0 10.5 63.0 pr int struct['la t' ].data, struct['lo n' ].data

Filtering sequences (fun way!)
>>> fseq = (struct fo r struct in seq if struct['lon' ] < 100) >>> fo r struct in fseq: ... ... http://test.pydap.org/test.csv.dods?test.id&test.lon<100 http://test.pydap.org/test.csv.dods?test.lat&test.lon<100 http://test.pydap.org/test.csv.dods?test.lon&test.lon<100 10.2 93.0 10.3 83.0 10.4 73.0 10.5 63.0 pr int struct['la t' ].data, struct['lo n' ].data

Server

pyDAP comes with a WSGI app that works as a DAP server Server is just a thin layer between plugins that handle data formats (netCDF, HFD5, SQL, etc.) and responses (DAS, DDS, DODS, HTML, KML, WMS, etc.) Can be deployed with Paster Script template:
● ●

paster create -t dap_server myserver

Plugins and responses

Plugins and responses

http://localhost:8080/file.nc.das

Plugins

Convert data from different formats to pyDAP types

Plugins for netCDF, CSV, Matlab 4/5, HDF5, GrADS grib, GDAL, DB API 2, grib2 easy_install dap.plugins.netcdf

EasyInstall (entry point dap.plugin):

Responses

Convert from pyDAP types to something else “Official” responses: DAS, DDS, DODS

Generate data and metadata from the dataset created by the plugins

Extra responses can be installed using EasyInstall (entry point dap.response)

ASCII response
Dataset { Sequence { Int32 id; Float64 lat; Float64 lon; } test; } test%2Ecsv; --------------------------------------------test.id, test.lat, test.lon 1, 10.1, 103 2, 10.2, 93 3, 10.3, 83 4, 10.4, 73 5, 10.5, 63

http://test.pydap.org/test.csv.ascii

HTML response

Generates an HTML form to download data

Redirects user to ASCII response

Useful for users without a DAP client

Example HTML response

JSON response
{"test%2Ecsv": {"attributes": {"filename": "test.csv"}, "type": "Dataset", "test": {"attributes": {}, "type": "Sequence", "id": {"attributes": {}, "type": "Int32"}, "lat": {"attributes": {}, "type": "Float64"}, "lon": {"attributes": {}, "type": "Float64"}}}}

http://test.pydap.org/test.csv.json

JSON response with data
{"test%2Ecsv": {"attributes": {"filename": "test.csv"}, "type": "Dataset", "test": {"attributes": {}, "type": "Sequence", "data": [[1, 10.1, 103.0], [2, 10.2, 93.0], [3, 10.3, 83.0], [4, 10.4, 73.0], [5, 10.5, 63.0]], "id": {"attributes": {}, "type": "Int32"}, "lat": {"attributes": {}, "type": "Float64"}, "lon": {"attributes": {}, "type": "Float64"}}}}
http://test.pydap.org/test.csv.json?output_data=1

WMS response

Returns maps (images) from requested variables and regions Works with geo-referenced grids and sequences

Layers can be composed together /coads.nc.wms?SST // annual mean /coads.nc.wms?SST[0] // january

Data can be constrained:
● ●

WMS example request

http://localhost:8080/netcdf/coads.nc.wms?LAYERS=SST&WIDTH=512

KML response

Generates XML file using the Keyhole Markup Language, pointing to the WMS response Nice and simple interface for quick visualizing data

Future

pyDAP 2.3 almost ready
● ● ●

Dapper compliance Faster XDR encoding/decoding Initial support for DDX response and parser

Build a rich web interface (AJAX) based on JSON + WMS + KML responses

Not only to pyDAP, but to other OPeNDAP servers using pyDAP as a proxy

Acknowledgments
● ● ●

OPeNDAP for all the support PSF for the financial support to be here Everybody who submitted bugs (bonus points for submitting patches!)

Sign up to vote on this title
UsefulNot useful