Professional Documents
Culture Documents
CITATIONS READS
8 112
6 authors, including:
Andrea Bergamasco
Italian National Research Council
116 PUBLICATIONS 1,771 CITATIONS
SEE PROFILE
Richard Signell
United States Geological Survey
110 PUBLICATIONS 3,231 CITATIONS
SEE PROFILE
To cite this article: A. Bergamasco, A. Benetazzo, S. Carniel, F.M. Falcieri, T. Minuzzo, R.P. Signell
& M. Sclavo (2012): Knowledge discovery in large model datasets in the marine environment: the
THREDDS Data Server example, Advances in Oceanography and Limnology, 3:1, 41-50
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
Advances in Oceanography and Limnology
Vol. 3, No. 1, June 2012, 41–50
1. Introduction
The marine environment is characterized by a large number of complex dynamical
processes of societal importance, such as sea level rise, coastal flooding, coastal erosion [1],
harmful algal blooms and oil spills. In addition, climate change induced effects at global
or regional scales have not been yet completely understood, and as a consequence there is a
large uncertainty on the effects that they may have on coastal zones [2].
Many national and international research bodies and institutions are actively acquiring
marine data both in situ (e.g. current meters, wave riders, tide gauges, etc.) and remote
Downloaded by [Mr Andrea Bergamasco] at 14:21 29 May 2012
(e.g. from satellite), as well as running complex, integrated numerical models with the aim
of monitoring and depicting the status of our seas.
Despite the growing number of datasets produced by observations and modelling,
ocean data is still often generated using custom formats, and distributed using a variety of
ad hoc methods, making it difficult to efficiently locate and access data from multiple
institutions. Scientists very often spend considerable efforts in the time consuming activity
of localizing and retrieving data; and, even when successful in this, they still have to spend
a considerable amount of time to properly organize them before plotting or analyzing
them, because of different formats, conventions, etc.
Luckily, it is possible to use existing tools and techniques to overcome this problem,
turning non-standard datasets held at institutions into standard web services in a way that
puts little burden on the data providers [3,4]. These approaches have been applied to the
U.S. Integrated Ocean Observing System (US-IOOS, see http://www.ioos.gov) to make
collectively held oceanographic data easy to find and utilize. IOOS is a coordinated
network of organizations that work together to acquire, organize and distribute
observational and model data in the coastal ocean, to allow for improved understanding
and prediction of the marine environment [5].
CNR-ISMAR Venice is helping to set up a national Italian IOOS framework, with the
focus of making both its data and model results efficiently available to organizations and
research bodies interested in monitoring and predicting the dynamics of the coastal marine
ecosystem [6]. The Italian IOOS network is being designed to connect naturally into
the international IOOS framework. Such an infrastructure will help the understanding
and forecasting of locally important issues such as the effects of severe meteo storms, the
implications of climate variability effects on global-regional scales, a quantitative risk
assessment in coastal areas, etc.
Examples of stake-holders that will immediately benefit from simple and efficient
access to large ocean dataset and model results are represented by, (a) companies involved
in the managing of marine coastal resources, including fisheries; (b) institutions dealing
with the management emergencies, including search and rescue and civil protection
activities; (c) marine scientists; (d) ‘‘policymakers’’ at local, regional, national and
international level; (e) recreational activities.
The IOOS approach is to standardize not on data formats, but on web services, and
to approve certain web services for certain types of data. For gridded data, the approved
IOOS services are currently Open Geospatial Consortium (OGC) Web Coverage Service
Advances in Oceanography and Limnology 43
(WCS) and the OPeNDAP service, in agreement with the Climate and Forecast (CF)
convention [7].
This paper aims at highlighting the efforts that CNR-ISMAR Venice has recently
carried out on this direction, discussing the basic ideas that have prompted the
oceanographic and meteorological communities to the direction of contributing to
‘‘knowledge discovery’’ by means of increasing the model data interoperability, as well as
data-model intercomparison and validation.
translation into and out of the common data model. While the catalogue services are
metadata brokers, the THREDDS Data Server is a data broker (Figure 1b), allowing
many different formats of files, as well as OPeNDAP service datasets, to be transformed
into a common data model for actual arrays of data.
In addition to allowing non-conforming data to be virtually transformed into a
common data model, the TDS has also another important characteristic that makes things
easier for data providers and users: aggregation. This means that many individual files on
disk can be virtually joined into a single dataset accessible through the web services. Thus
oceanic and atmospheric model output, as well as remote sensing data, which are typically
present on a file system as numerous smaller files, can be accessed via a single OPeNDAP
or WCS URL. The TDS is simple to install, as it is 100% Java servlet typically deployed
on Tomcat. A provider simply verifies that they have Sun Java, downloads and unzips
Tomcat, and then deploys the thredds.war file through the Tomcat GUI, a process that
typically takes less than one hour, and sometimes as little as 10 minutes. Configuring the
Downloaded by [Mr Andrea Bergamasco] at 14:21 29 May 2012
TDS for local datasets, of course, takes longer, but is still straightforward.
The CNR-ISMAR Venice catalogue http://tds.ve.ismar.cnr.it:8080/thredds/
catalog.html represents one of the first Italian community examples taking the IOOS
approach. Once a user has selected archive dataset, a description of the dataset (metadata)
appears, along with available web services and data viewers. At the moment, the CNR-
ISMAR Venice catalogue contains datasets from several different implementations of the
coupled hydrodynamic-wave-sediment model ROMS (www.myroms.org). Datasets are
regularly updated, and include cases from different geographic areas (e.g., the Adriatic
sea and the Gulf of Lyon). For a thorough description of these test cases, see [1,10].
Figure 2. Using the quick and intuitive GODIVA2 web map service client, users can pick up
the ocean model variables they wish to visualize from a TDS catalog. After arranging it in terms
of latitude, longitude, depth and time, GODIVA2 service allows for mapping, drawing sections and
producing time animations. Shown here, is the sea surface temperature from a high-resolution run
(500 m) of the northern Adriatic sea using ROMS-SWAN model, referring to July 3, 2007.
models forced with high-resolution atmospheric forcing provided by the model COSMO-
I7. Further details on the numerical implementations are given in [12].
Using GODIVA2 (which can be directly activated by clicking in the bottom area of the
metadata page popping up), we can for instance visualize the field of potential temperature
at the model top level. Moreover, there is also the possibility of using embedded features
that takes care of exporting the results on GoogleTM Earth maps, as shown in Figure 3.
When it is necessary to combine layers or create more complex visualizations, the user can
access the data using the Unidata Integrated Data Viewer (IDV), freely available at (http://
www.unidata.ucar.edu/software/idv/), as proposed in Figure 4.
Making things easier and more efficient for users can be particularly important during
emergency response situations. After the Fukushima incident, the US Navy rapidly spun
up a 1 km NCOM forecast model covering hundreds of km offshore of the Sendai power
plant. The model forecasts were officially made public as NetCDF 3 files, one file for
each forecast time, and packed into 9 GB tar.gz files delivered on an FTP site at NCEP.
This made the forecast data effectively inaccessible to researchers at sea with limited
bandwidth. To facilitate use, the 9GB files were transferred to a THREDDS Data Server,
converted to NetCDF4 (which resulted in a total file size the same as the tar.gz file), and
Advances in Oceanography and Limnology 47
Downloaded by [Mr Andrea Bergamasco] at 14:21 29 May 2012
Figure 3. The same example shown in Figure 2, exported now to GoogleTM Earth mapping.
Figure 4. Using the open-source software IDV, more complex images can be arranged, such as this
3D view of the northern Adriatic topography (in orange) with superimposed the sea surface
temperature field from a high-resolution run (500 m) of the northern Adriatic sea using ROMS-
SWAN model, referring to May 28, 2007. The figure also shows averaged (2D) velocity fields (black
arrows, plotted every 10th grid points) and contours of significant wave height (m).
48 A. Bergamasco et al.
Downloaded by [Mr Andrea Bergamasco] at 14:21 29 May 2012
Figure 5. Using NCTOOLBOX to access data from the TDS via OPeNDAP, displaying surface
current vectors and speed for the Fukushima region. On the bottom is shown the actual script
in Matlab that acquires and subsamples the data and produces the plot on the top.
virtually turned into a single CF Compliant dataset available via the TDS, allowing
distribution through OPeNDAP, WCS, and WMS services. This allowed efficient
sub setting and extraction by WHOI researchers at sea, who used the forecast data to
predict the movement of radioactive material they identified in surface water samples
(see Figure 5). The metadata service also allowed others searching for Fukushima products
to effectively locate this new datasets once it came on line [13].
Advances in Oceanography and Limnology 49
files requires learning the NcML language and understanding the Common Data Model
requirements. However, this configuration is mostly a one-time effort, so that an expert
can assist in the initial implementation, and then local instances of the TDS can be
maintained by personnel without this detailed knowledge.
To continue to build on this success, we need to address a few challenges. While this
approach works well for structured grids, standards are just now being introduced for
unstructured grids. Standard handling of staggered grid information and velocity vectors
needs to be improved. And finally, issues related to management and dissemination of
public data (i.e. an adequate data policy) can be faced when unlocking large model
outputs.
The brokering approach to harvest metadata from many different services and to read
data from many different formats into common data models greatly improves model
access and interoperability, unlocking information from other fields (e.g., social and
economic studies). These are very desirable properties in the direction of a ‘‘INSPIRE
compliant web service’’ (see also http://inspire.jrc.ec.europa.eu), since they contribute to
lower the so-called ‘‘Users and Data Producers entry barriers’’ [8].
Acknowledgements
The authors thank Unidata for the technical support and help. This work was supported by the
Project ‘‘MARINA’’, funded by Regione Veneto within the initiatives of the law n. 15/2007. The
activity was partially supported by Projects PRIN 2008YNPNT9_005 and FIRB ‘‘DECALOGO’’
(code #RBFR08D825) and by the Project FIELD_AC, funded by the EC Fp7/2007–2013 under
grant agreement no. 242284.
References
[1] S. Carniel, M. Sclavo, and R. Archetti, Towards validating a last generation, integrated wave-
current-sediment numerical model in coastal regions using video measurements, Oceanological and
Hydrobiol. Studies 40 (2011), pp. 11–20, DOI: 10.2478/s13545-011-0036-1.
[2] D. Bellafiore, E. Bucchignani, S. Gualdi, S. Carniel, V. Djurdjevic, and G. Umgiesser, Assessment
of meteorological climate model inputs for coastal hydrodynamics modeling. Ocean Dyn. 62 (2012),
pp. 555–568, DOI: 10.1007/s10236-011-0508-2.
50 A. Bergamasco et al.
[3] R.P. Signell, S. Carniel, J. Chiggiato, I. Janecovic, J. Pullen, and C. Sherwood, Collaboration
tools and techniques for large model datasets, J. Marine Sys. 65 (2008), pp. 154–161, DOI:
10.1016/j.jmarsys.2007.02.013.
[4] R.P. Signell, Model data interoperability for the United States Integrated Ocean Observing
System (IOOS), Proceedings of the 11th International Conference on Estuarine and Coastal
Modeling, Seattle, WA, USA, 2010. DOI:10.1061/41121(388)14.
[5] S. Rayner, The U.S. Integrated Ocean Observing System in a global context, Marine Tech. Soc. J.
44 (2010), pp. 26–31, DOI:10.4031/MTSJ.44.6.1.
[6] A. Bergamasco, S. Carniel, M. Sclavo, and T. Minuzzo, From interoperability to knowledge
discovery using large model datasets in the marine environment: the THREDDS Data Server
example. Data Flow from Space to Earth 2011 International Conference, Venice, 21–23 March
2011. Available at http://www.space.corila.it/Program.htm
[7] J. de La Beaujardiere, C.J. Beegle-Krause, L. Bermudez, S. Hankin, L. Hazard, E. Howlett,
S. Le, R. Proctor, R.P. Signell, D. Snowden, and J. Thomas, Ocean and coastal data
management, Proc. OceanObs’09: Sustained Ocean Observations and Information for Society
Downloaded by [Mr Andrea Bergamasco] at 14:21 29 May 2012