You are on page 1of 1

HIDDRA: Highly Independent Data Distribution and Retrieval Architecture for Earth Observation Missions

J.M.
1, Tirado

D. J. 2 and A. de la Fuente2 F. Flix


1Computer

1, Higuero

1, Carretero

Science Department, Carlos III University, Spain 2Ingeniera y Servicios Aeroespaciales, S.A., Spain

Introduction
Institutions such as NASA, ESA or JAXA face up to the challenge of distributing data from their missions to both the scientific community, and their long term archives. This is a complex problem, as it includes a vast amount of data, several geographically distributed archives, heterogeneous architectures with heterogeneous networks, and users spread around the world. We propose a novel architecture that solves this problem attending to fulfill the requirements of the final user. Our architecture is a modular system that provides a highly efficient parallel multiprotocol download engine, using a publisher/subscriber policy which helps the final user to obtain data of interest transparently. We have evaluated a first prototype, in collaboration with the SMOS operation team at the ESAC centre in Villafranca del Castillo (Spain) which shows high scalability and performance, opening a wide spectrum of opportunities.

HIDDRA Architecture

Features
Generic Solution: HIDDRA offers a generic solution for data distribution scenarios, in which a set of users is subscribed to different data sets. User-Driven/User-Friendly: HIDDRA is a user-driven architecture that improves the user experience simplifying daily tasks by offering a simple and intuitive GUI. Users can subscribe to several types of products, which will be automatically downloaded when available. Distributed, Autonomous and Dependable: Each component of the architecture has been designed to work independently of its location. The system administrator can easily deploy components according to the expected workload. All HIDDRA components have been designed with autorecovery systems. In case any component fails, pending tasks will be resumed after an automatic restart. Multiprotocol Parallel Downloads: Current solutions deal with a single transmission protocol. By contrast, the HIDDRA downloader can use simultaneously HTTP, HTTPS, FTP and Bittorrent. The modular design permits to add other transmission protocols (i.e. GridFTP). Easy to Integrate: The HIDDRA architecture can be easily integrated into existing Earth observation mission infrastructures. It is only necessary to adapt the data event notification system and import existing users subscriptions files into the HIDDRA Subscription Manager. Scalable: Several final users can be connected to a single HIDDRA downloader. HIDDRA downloaders reduce mirrors workload as the products are transferred only once, and afterwards they are distributed among the users of the virtual community.

Evaluation
Load Balance and Efficient Bandwidth Utilization: Load balance in the HIDDRA architecture is addressed by the HIDDRA clients. It is not necessary to perform load balance in the server side, this approach reduces administration tasks. HIDDRA Clients monitor current network conditions, and decide which mirrors will be used. Fault Tolerance Capabilities: The HIDDRA downloader is able to detect mirror failures, and continue the download using available mirrors. The integrity of the downloaded files is also checked by chunks using a hash function. If a portion of the file is corrupted, it will be redownloaded. These processes are transparent to the final user.
1500 12 Bandwidth (KiB/s) 1000 Mirror A

Mirror A stops working

Mirror A works again

10 Bandwidth (MiB/s)

500

0 6 1500 4 Bandwidth (KiB/s) 1000 0 Mirror B 5 10 15 20 Seconds HIDDRA 25 30 35

Client stops connections to Mirror A and uses only Mirror B

500

0 0 200 400 600 800 Seconds HIDDRA Clients Mirror A 1000 1200 1400 1600

Mirror B

10

15 Seconds

20

25

30

35

Figure 1: Testing load balancing and bandwidth utilization in a scenario composed of two HTTP mirrors, two HIDDRA Clients with independent HIDDRA downloaders running in the same physical machine. This fragment corresponds to the transference of 14 GiB using files between 50 300 MiB.

Figure 2: Testing fault tolerance in a scenario composed of two HTTP mirrors and one HIDDRA Client / Downloader. During the file transfer, Mirror A stops and restarts after a while. Mirror A failure is detected and the download process continues using only Mirror B. When Mirror A starts working again the download continues using both mirrors.

Future Work
Testing in a large scale scenario: Future testing will be carried out in a larger scenario using geographically distributed components. External service providers: Following the cloud computing trend, we plan to test the feasibility of using external computing and storage providers such as Amazon EC2/S3. Automatic data processing workflows: Current work is centered in offering workflow execution functionalities for final users. Users will be able to specify workflows that will be executed automatically each time a new product is available. Workflows execution will be integrated with cluster and/or grid infrastructures.

Conclusions
HIDDRA satisfies user needs and permits a highly distributed, scalable and autonomous architecture simplifying administration tasks. Our architecture maximizes bandwidth utilization, reducing redundant downloads by creating virtual communities.

th 4

GRID & e-Collaboration Workshop Digital Repositories

You might also like