You are on page 1of 4

Overview

At the heart of this system will be a database thats stores lots and lots of trials data. This data will almost exclusively be time-stamped, and much of it will have a spatial element. The main data type stored will be vehicle tracks, though a range of types of vehicles will be catered for - and each type typically has different attributes. Around the database will be an ecosystem that transforms data ready for insertion in the database, and transforms it again on the way out. For data in the database, it is hoped that a spatial view can be presented to network users (probably OpenLayers view of vehicle tracks). Beyond these initial thoughts I've also invested some effort in Use Case Scenarios Schema Thoughts, tentative requirements and the Underlying problem.

Concepts
Vehicle: something that moves Vehicle track: series of time-stamped locations for that vehicle Sensor: something that can make recordings (light level meter, water flow recorder) Sensor recording: series of time-stamped recordings from a sensor Trial: specific time period under which something is trialled, in which vehicles and sensors participate Analysis Tool: piece of software that analyses vehicle tracks or sensor recordings, typically in a proprietary data format

Volumes
It is expected that the system will be required to read in around a million observations per month, extracted from 100 data files.

Capabilities
The system should be able to: take data in a range of data formats encode the data into a common schema store this data into a spatial database display vehicle tracks from the database in a web-browser extract data from database transform extracted data back to native format

Data flows

Note: the data-nugget Pack/Unpack processing in the above diagram is not a requirement. It's a suggestion for how to overcome the conundrum of storing data in a range of schemas. Other solutions (such as an attached table per data-type that contains additional fields) are welcome.

Data flow description


Input data While it would appear that vehicle tracks will be in the same format, a car track is typically in just 2d, whereas an aircraft track will include altitude, but may also contain pitch, roll, yaw attributes. For sensor data recordings there could be a very wide range of data formats/attributes - though they will all have a time-stamp. For example, an engine recording system will have revs/min, trottle opening, fuel usage.

Spatial Database It is expected that the database will have at least these two tables: Observations: all of the data observations (all have time, some have location) Recordings: details of original source data-files (type, name, reference) Trials: the named time periods when trials were undertaken Vehicles: expanded detail regarding vehicles (name, type)

The database will be designed such that it is able to provide spatial track views to OpenLayers. It should also provide quick extraction of data by recording-type, trial name, or vehicle name. Transform (Pack) This process is a pluggable framework that includes a 'reader' specification for each data type. One new (or modified) data type per month. The transformer will extract the common attributes necessary for the spatial database, then encode remaining fields into a typed data-nugget (probably XML). Transform (Unpack) This pluggable framework includes a 'writer' specification for each analysis tool format. It takes the database fields (plus decoded data nuggets where necessary) and produces an output file in the specified format. Note: the pack/unpack into XML has a shortcoming: the relational database is not able to filter according to these parameters, it's only able to filter data based on the core attributes. Data output Many analysis tools only require time and location: core database fields. Other specific tools, however require additional columns. A schema-agnostic application like Excel will just receive a column for each attribute in the selected data, whereas Google Earth may receive color coding according to the value of a specific attribute. Graphical view Instead of having to learn/install analysis tools, some users just need a very quick look at the data. To meet this requirement it should be possible to quickly open a browser-based plot of vehicle tracks. Potentially it would also be valuable to quickly view a dataset in tabular form, or to view a plot of one or more variables against time. Global system constraints I have a preference to using Java, though this is not a formal requirement for this venture. I've installed and played with Talend and Pentaho Kettle. I haven't used them in anger, but as a Java developer I've s subtle preference for the Java-based Kettle. I've also a slight emotional preference for Kettle - Talend seemed overly keen to push me into a commercial version. Local system constraints I'm developing this system with a particular installation in mind. This installation has the following constraints:

Cannot rely on Internet connection, or Internet services Windows XP clients (IE8) Windows 2003 Server The system won't have a dedicated db-administrator. I suspect there will just be me. I've some Access/SqlServer/Postgres skills Beyond myself there is a supporting data operator who has client-side .Net development skills.

You might also like