UCRL-PRES-148116

The Earth System Grid
Presented by Dean N. Williams
PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams (LLNL) http://www.earthsystemgrid.org
Presented at: The “EO GRID” Workshop Frascati, Italy May 6, 2002 Earth System Grid - Williams

Earth System Grid (ESG): Overview
 Funded by the Scientific Discovery through Advanced Computing (SciDAC), this program seeks a new paradigm in the climate change community evolving from centralized data sharing to distributed data-sharing.  Enabling geographically distributed teams of researchers to effectively and rapidly acquire knowledge and understanding of massive amounts of climate data holdings.  Multiple interfaces to ESG will allow researchers to focus on science and not issues with data receipt, format, and data set manipulation.

May 6, 2002

Earth System Grid - Williams

ESG: Why is ESG Important to the U.S. Climate Change Program
 

 

Climate model output and quality observations are vital to providing timely assessments of climate change and impacts. Recent U.S. and IPCC assessment efforts made it clear the lack of accessibility to model simulations is a major problem for future assessments. Access to retrospective climate data (input and output) needed to enable a feedback mechanism to tie researchers directly back to quality control and diagnostics of models. Researchers require access to “format independent” climate and observational data for case-study & training. In the U.S., climate simulation can be viewed as a systems problem, requiring a team of multi-agencies and institutions working together in collaboration.
May 6, 2002 Earth System Grid - Williams

ESG: U.S. Collaborations & Development
LBNL: Climate storage facility LLNL: Model diagnostics & inter-comparison

ANL: Computational grids, & grid-based applications

USC/ISI: Computational grids, & grid-based applications

NCAR: Climate change predication and scenarios

LANL: Next generation coupled models & computing
May 6, 2002

ORNL: Climate storage & computational resources

Earth System Grid - Williams

ESG: Requirements & Priority Matrix
ESG Developer ESG Administrator H L H H L L H H H H L M L M ESG User H H M L M L H H H M H H H H ESG Services: Framework H Automatic Installation L Distributed Computing Authorization & Authentication H Registration H Event Services L Task Management L Logging Services L Data Systems Search and Discovery M data movement (transport) L meta-data framework H collaboratories M Tools analysis M visualization L collaboration M L = LOW, M = MEDIUM, H = HIGH May 6, 2002 Earth System Grid - Williams

ESG: U.S. Department of Energy (DOE) Next Generation Internet (NGI) Project
 ESG-I (past):
 Focused on developing techniques for the high-speed data movement between sites and users (e.g., the secure highly efficient File Transfer service, called gridFTP, developed by ANL (i.e., Globus))  Developed replica catalogs for keeping track of data locations  Developed request manages for coordinating multiple transfers  Developed a grid-enabled version of LLNL’s data analysis package
May 6, 2002 Earth System Grid - Williams

ESG: ESG-I Architecture
PCMDI application n
Disk Cache
LDAP

Metadata Catalog Replica Catalog
ANL

text

LDAP

Request Manager
LDAP

Network

GridFTP
CLIENT
CORBA

Weather Service
everywhere

GSI-pftpd
tape system SDSC

GridFTP
Disk Cache ANL

HRM
tape system

GridFTP
Disk Cache

GridFTP
Disk Cache ISI

GridFTP
Disk Cache NCAR

GridFTP
Disk Cache LBNL­Clipper

LBNL­PDSF

May 6, 2002

Earth System Grid - Williams

ESG: ESG-I Team Presented their work at Supercomputing 2001
RAID

LDAP/Sever Metadata Catalog LLNL

CLOUD
RR E IN A

Network

LDAP/Server Metadata Catalog SC ‘01

Local Disks

U &

LDAP/Sever Metadata Catalog LBNL tape system parallel disk system

T

V
LDAP/Sever Metadata Catalog ANL tape system parallel disk system

May 6, 2002

Earth System Grid - Williams

ESG: DOE SciDAC Project
 ESG-II (present):
 Building upon the substantial work of ESG-I  Grid-wide services supporting authentication, authorization, data discovery, and user specified analysis  Metadata services supporting remote data browsing, querying, accessing, displaying, etc.  Filtering services performing intelligent model specific analysis before delivering the results to the user  Integrate next-generation data analysis and visualization applications (such as ongoing work at LLNL and NCAR), webbased data portals and other thin clients supporting the Distributed Oceanographic Data System (DODS), and collaborative problem-solving environments.
May 6, 2002 Earth System Grid - Williams

ESG: ESG-II Architecture

May 6, 2002

Earth System Grid - Williams

ESG: Metadata Services
ESG CLIENTS API   & USER INTERFACES SEARCH & DISCOVERY PUBLISHING ANALYSIS & VISUALIZATION BROWSING & DISPLAY ADMINISTRATION METADATA EXTRACTION METADATA AGGREGATION HIGH LEVEL METADATA SERVICES METADATA METADATA & DATA  ANNOTATION REGISTRATION METADATA VALIDATION METADATA BROWSING METADATA DISPLAY METADATA QUERY METADATA DISCOVERY

CORE METADATA SERVICES METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY

METADATA HOLDINGS Data & Metadata Catalog Dublin Core Database mirror Dublin Core XML Files COARDS Database COMMENTS XML Files

May 6, 2002

Earth System Grid - Williams

ESG: Collaboration Network
Data consumers ESG services: information, replica, metadata, community authorization

?

R

M

CAS

Grid and Network Infrastructure

Computational resources

Online storage systems Data producers
May 6, 2002 Earth System Grid - Williams

ESG: Example of a Web-based Data Portal (currently serving 40+ simulations of AMIP, CMIP, and PCM data, and growing)

May 6, 2002

Earth System Grid - Williams

ESG: Example of a Client Application

May 6, 2002

Earth System Grid - Williams

ESG: Example of a Script Access
 The next-generation language, Python, is used to access the Earth System Grid at LLNL

Import cdms db = cdms.open(“ldap://localhost:389/database=demo,ou=PCMDI,o=LLNL,c=US”) f = db.open( “ncep_reanalysis_mo”) ds = f(‘ts’)

May 6, 2002

Earth System Grid - Williams

ESG: Concluding Statements
 ESG is a highly collaborative effort and will allow users to quickly access data storage facilities storing petabytes of raw or processed data in an application independent manner.  Payoffs of this distributed collaborative infrastructure, would include:
     distributed data-sharing Simplified data discovery of climate data Large-scale climate data processing and analysis Increased collaboration among climate research scientists Aid in climate assessments and estimates of future climate variability and trends

 For more information on ESG, visit our website at: http://www.earthsystemgrid.org

May 6, 2002

Earth System Grid - Williams