You are on page 1of 25

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/215757945

The Anatomy of the Grid: Enabling Scalable Virtual Organizations

Conference Paper in The International Journal of High Performance Computing Applications · August 2001
DOI: 10.1177/109434200101500302 · Source: DBLP

CITATIONS READS
6,206 4,849

3 authors, including:

Ian Foster Steven John Tuecke


University of Chicago University of Chicago
1,243 PUBLICATIONS 104,272 CITATIONS 150 PUBLICATIONS 32,730 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ian Foster on 29 October 2018.

The user has requested enhancement of the downloaded file.


International Journal of High Performance
Computing Applications
http://hpc.sagepub.com/

The Anatomy of the Grid: Enabling Scalable Virtual Organizations


Ian Foster, Carl Kesselman and Steven Tuecke
International Journal of High Performance Computing Applications 2001 15: 200
DOI: 10.1177/109434200101500302

The online version of this article can be found at:


http://hpc.sagepub.com/content/15/3/200

Published by:

http://www.sagepublications.com

Additional services and information for International Journal of High Performance Computing Applications can be found at:

Email Alerts: http://hpc.sagepub.com/cgi/alerts

Subscriptions: http://hpc.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Citations: http://hpc.sagepub.com/content/15/3/200.refs.html

>> Version of Record - Aug 1, 2001

What is This?

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


COMPUTING APPLICATIONS
ANATOMY OF THE GRID

1 Introduction
The term Grid was coined in the mid-1990s to denote a
proposed distributed computing infrastructure for ad-
THE ANATOMY OF THE GRID: vanced science and engineering (Foster and Kesselman,
ENABLING SCALABLE VIRTUAL 1998a). Considerable progress has since been made on
ORGANIZATIONS the construction of such an infrastructure (e.g., Beiriger
et al., 2000; Brunett et al., 1998; Johnston, Gannon, and
Nitzberg, 1999; Stevens et al., 1997), but the term Grid
1
Ian Foster has also been conflated, at least in popular perception, to
2
Carl Kesselman embrace everything from advanced networking to artifi-
3
Steven Tuecke cial intelligence. One might wonder whether the term has
any real substance and meaning. Is there really a distinct
Grid problem and hence a need for new Grid technolo-
Summary
gies? If so, what is the nature of these technologies, and
“Grid” computing has emerged as an important new field, what is their domain of applicability? While numerous
distinguished from conventional distributed computing by
groups have interest in Grid concepts and share, to a sig-
its focus on large-scale resource sharing, innovative appli-
cations, and, in some cases, high performance orienta- nificant extent, a common vision of Grid architecture, we
tion. In this article, the authors define this new field. First, do not see consensus on the answers to these questions.
they review the “Grid problem,” which is defined as flexible, Our purpose in this article is to argue that the Grid con-
secure, coordinated resource sharing among dynamic cept is indeed motivated by a real and specific problem
collections of individuals, institutions, and resources—
and that there is an emerging, well-defined Grid technol-
what is referred to as virtual organizations. In such set-
tings, unique authentication, authorization, resource ac- ogy base that addresses significant aspects of this prob-
cess, resource discovery, and other challenges are en- lem. In the process, we develop a detailed architecture
countered. It is this class of problem that is addressed by and road map for current and future Grid technologies.
Grid technologies. Next, the authors present an extensible Furthermore, we assert that while Grid technologies are
and open Grid architecture, in which protocols, services,
currently distinct from other major technology trends,
application programming interfaces, and software devel-
opment kits are categorized according to their roles in en- such as Internet, enterprise, distributed, and peer-to-peer
abling resource sharing. The authors describe require- computing, these other trends can benefit significantly
ments that they believe any such mechanisms must satisfy from growing into the problem space addressed by Grid
and discuss the importance of defining a compact set of technologies.
intergrid protocols to enable interoperability among differ-
The real and specific problem that underlies the Grid
ent Grid systems. Finally, the authors discuss how Grid
technologies relate to other contemporary technologies, concept is coordinated resource sharing and problem
including enterprise integration, application service pro- solving in dynamic, multi-institutional virtual organiza-
vider, storage service provider, and peer-to-peer comput- tions. The sharing that we are concerned with is not pri-
ing. They maintain that Grid concepts and technologies marily file exchange but rather direct access to comput-
complement and have much to contribute to these other
ers, software, data, and other resources, as is required by a
approaches.
range of collaborative problem-solving and resource-
brokering strategies emerging in industry, science, and
engineering. This sharing is, necessarily, highly con-
trolled, with resource providers and consumers defining
clearly and carefully just what is shared, who is allowed to
1
MATHEMATICS AND COMPUTER SCIENCE DIVISION, AR-
Address reprint requests to Ian Foster, Mathematics and Com- GONNE NATIONAL LABORATORY, ARGONNE, ILLINOIS, AND
puter Science Division, Argonne National Laboratory, 9700 S. DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF
Cass Avenue, MCS/221, Argonne, IL 60439; phone: (630) 252- CHICAGO
4619; fax: (630) 252-9556; e-mail: foster@mcs.anl.gov. 2
INFORMATION SCIENCES INSTITUTE, UNIVERSITY OF
SOUTHERN CALIFORNIA
The International Journal of High Performance Computing Applications, 3
MATHEMATICS AND COMPUTER SCIENCE DIVISION, AR-
Volume 15, No. 3, Fall 2001, pp. 200-222
 2001 Sage Publications
GONNE NATIONAL LABORATORY, ARGONNE, ILLINOIS

200 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


share, and the conditions under which sharing occurs. A
set of individuals and/or institutions defined by such shar-
ing rules form what we call a virtual organization (VO).
The following are examples of VOs: the application ser-
vice providers, storage service providers, cycle providers,
and consultants engaged by a car manufacturer to perform
scenario evaluation during planning for a new factory; “The real and specific problem that
members of an industrial consortium bidding on a new air- underlies the Grid concept is coordinated
craft; a crisis management team and the databases and sim- resource sharing and problem solving in
ulation systems that they use to plan a response to an emer- dynamic, multi-institutional virtual
gency situation; and members of a large, international,
organizations.”
multiyear, high-energy physics collaboration. Each of
these examples represents an approach to computing and
problem solving based on collaboration in computation-
and data-rich environments.
As these examples show, VOs vary tremendously in
their purpose, scope, size, duration, structure, community,
and sociology. Nevertheless, careful study of underlying
technology requirements leads us to identify a broad set of
common concerns and requirements. In particular, we see
a need for highly flexible sharing relationships, ranging
from client-server to peer-to-peer; for sophisticated and
precise levels of control over how shared resources are
used, including fine-grained and multistakeholder access
control, delegation, and application of local and global pol-
icies; for sharing of varied resources, ranging from pro-
grams, files, and data to computers, sensors, and networks;
and for diverse usage modes, ranging from single user to
multiuser and from performance sensitive to cost sensitive
and hence embracing issues of quality of service, schedul-
ing, co-allocation, and accounting.
Current distributed computing technologies do not ad-
dress the concerns and requirements just listed. For exam-
ple, current Internet technologies address communication
and information exchange among computers but do not
provide integrated approaches to the coordinated use of re-
sources at multiple sites for computation. Business-to-
business exchanges (Sculley and Woods, 2000) focus on
information sharing (often via centralized servers). So do
virtual enterprise technologies, although here sharing may
eventually extend to applications and physical devices
(e.g., Barry et al., 1998). Enterprise distributed computing
technologies such as CORBA and Enterprise Java enable
resource sharing within a single organization. The Open
Group’s Distributed Computing Environment (DCE) sup-
ports secure resource sharing across sites, but most VOs
would find it too burdensome and inflexible. Storage ser-
vice providers (SSPs) and application service providers
(ASPs) allow organizations to outsource storage and com-
puting requirements to other parties, but only in con-

ANATOMY OF THE GRID 201

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


strained ways: for example, SSP resources are typically to many diverse disciplines and activities: it is not limited
linked to a customer via a virtual private network (VPN). to science, engineering, and business activities. It is be-
Emerging “distributed computing” companies seek to cause of this broad applicability of VO concepts that Grid
harness idle computers on an international scale (Foster, technology is important.
2000) but, to date, support only highly centralized access
to those resources. In summary, current technology either
2 The Emergence of Virtual
does not accommodate the range of resource types or does
Organizations
not provide the flexibility and control on sharing relation-
ships needed to establish VOs. Consider the following four scenarios:
It is here that Grid technologies enter the picture. Over
the past 5 years, research and development efforts within 1. A company needing to reach a decision on the
the Grid community have produced protocols, services, placement of a new factory invokes a sophisticated
and tools that address precisely the challenges that arise financial forecasting model from an ASP, provid-
when we seek to build scalable VOs. These technologies ing it with access to appropriate proprietary histor-
include security solutions that support management of ical data from a corporate database on storage
credentials and policies when computations span multiple systems operated by an SSP. During the decision-
institutions; resource management protocols and services making meeting, what-if scenarios are run
that support secure remote access to computing and data collaboratively and interactively, even though the
resources and the co-allocation of multiple resources; in- division heads participating in the decision are
formation query protocols and services that provide con- located in different cities. The ASP itself contracts
figuration and status information about resources, organi- with a cycle provider for additional “oomph” dur-
zations, and services; and data management services that ing particularly demanding scenarios, requiring of
locate and transport data sets between storage systems course that cycles meet desired security and
and applications. performance requirements.
Because of their focus on dynamic, cross-organiza- 2. An industrial consortium formed to develop a fea-
tional sharing, Grid technologies complement rather than sibility study for a next-generation supersonic air-
compete with existing distributed computing technolo- craft undertakes a highly accurate multi-
gies. For example, enterprise distributed computing sys- disciplinary simulation of the entire aircraft. This
tems can use Grid technologies to achieve resource shar- simulation integrates proprietary software compo-
ing across institutional boundaries; in the ASP/SSP space, nents developed by different participants, with
Grid technologies can be used to establish dynamic mar- each component operating on that participant’s
kets for computing and storage resources, hence over- computers and having access to appropriate design
coming the limitations of current static configurations. databases and other data made available to the con-
We discuss the relationship between Grids and these tech- sortium by its members.
nologies in more detail below. 3. A crisis management team responds to a chemical
In the rest of this article, we expand on each of these spill by using local weather and soil models to esti-
points in turn. Our objectives are to (1) clarify the nature mate the spread of the spill, determining the im-
of VOs and Grid computing for those unfamiliar with the pact based on population location as well as geo-
area, (2) contribute to the emergence of Grid computing graphic features such as rivers and water supplies,
as a discipline by establishing a standard vocabulary and creating a short-term mitigation plan (perhaps
defining an overall architectural framework, and (3) de- based on chemical reaction models), and tasking
fine clearly how Grid technologies relate to other technol- emergency response personnel by planning and
ogies, explaining both why emerging technologies do not coordinating evacuation, notifying hospitals, and
yet solve the Grid computing problem and how these tech- so forth.
nologies can benefit from Grid technologies. 4. Thousands of physicists at hundreds of laborato-
It is our belief that VOs have the potential to change ries and universities worldwide come together to
dramatically the way we use computers to solve prob- design, create, operate, and analyze the products of
lems, much as the Web has changed how we exchange in- a major detector at CERN, the European high-
formation. As the examples presented here illustrate, the energy physics laboratory. During the analysis
need to engage in collaborative processes is fundamental phase, they pool their computing, storage, and net-

202 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Fig. 1 An actual organization can participate in one or more virtual organizations (VOs) by sharing some or all of its resources.
We show three actual organizations (the ovals) and two VOs: P, which links participants in an aerospace design consortium, and Q,
which links colleagues who have agreed to share spare computing cycles, for example, to run ray-tracing computations. The orga-
nization on the left participates in P, the one to the right participates in Q, and the third is a member of both P and Q. The policies
governing access to resources (summarized in “quotes”) vary according to the actual organizations, resources, and VOs in-
volved.

working resources to create a “Data Grid” capable


of analyzing petabytes of data (Chervenak et al.,
2001; Hoschek et al., 2000; Moore et al., 1998).

These four examples differ in many respects: the number


and type of participants, the types of activities, the duration
and scale of the interaction, and the resources being shared.
But they also have much in common, as discussed in the
following (see also Figure 1).
In each case, a number of mutually distrustful partici-
pants with varying degrees of prior relationship (perhaps
none at all) want to share resources to perform some task.
Furthermore, sharing is about more than simply docu-
menting exchange (as in “virtual enterprises”)
(Camarinha-Matos et al., 1998): it can involve direct ac-
cess to remote software, computers, data, sensors, and
other resources. For example, members of a consortium
may provide access to specialized software and data and/or
pool their computational resources.
Resource sharing is conditional: each resource owner
makes resources available, subject to constraints on when,
where, and what can be done. For example, a participant in
VO P of Figure 1 might allow VO partners to invoke their
simulation service only for “simple” problems. Resource
consumers may also place constraints on properties of the

ANATOMY OF THE GRID 203

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


resources they are prepared to work with. For example, a
participant in VO Q might accept only pooled computa-
tional resources certified as “secure.” The implementation
of such constraints requires mechanisms for expressing
policies, establishing the identity of a consumer or re-
source (authentication), and determining whether an oper-
ation is consistent with applicable sharing relationships
(authorization).
Sharing relationships can vary dynamically over time,
in terms of the resources involved, the nature of the access
permitted, and the participants to whom access is permit-
ted. And these relationships do not necessarily involve an
explicitly named set of individuals but rather may be de-
fined implicitly by the policies that govern access to re-
sources. For example, an organization might enable access
by anyone who can demonstrate that he or she is a “cus-
tomer” or a “student.”
The dynamic nature of sharing relationships means that
we require mechanisms for discovering and characterizing
the nature of the relationships that exist at a particular point
in time. For example, a new participant joining VO Q must
be able to determine what resources it is able to access, the
“quality” of these resources, and the policies that govern
access.
“Sharing relationships are often not
Sharing relationships are often not simply client-server
simply client-server but peer to peer: but peer to peer: providers can be consumers, and sharing
providers can be consumers, and sharing relationships can exist among any subset of participants.
relationships can exist among any subset Sharing relationships may be combined to coordinate use
of participants.” across many resources, each owned by different organiza-
tions. For example, in VO Q, a computation started on one
pooled computational resource may subsequently access
data or initiate subcomputations elsewhere. The ability to
delegate authority in controlled ways becomes important
in such situations, as do mechanisms for coordinating op-
erations across multiple resources (e.g., coscheduling).
The same resource may be used in different ways, de-
pending on the restrictions placed on the sharing and the
goal of the sharing. For example, a computer may be used
only to run a specific piece of software in one sharing ar-
rangement, while it may provide generic compute cycles in
another. Because of the lack of a priori knowledge about
how a resource may be used, performance metrics, expec-
tations, and limitations (i.e., quality of service) may be part
of the conditions placed on resource sharing or usage.
These characteristics and requirements define what we
term a virtual organization, a concept that we believe is be-
coming fundamental to much of modern computing. VOs
enable disparate groups of organizations and/or individu-
als to share resources in a controlled fashion, so that mem-
bers may collaborate to achieve a shared goal.

204 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


3 The Nature of Grid Architecture exchange, so we require standard protocols and syntaxes
for general resource sharing.
The establishment, management, and exploitation of dy-
Why are protocols critical to interoperability? A proto-
namic, cross-organizational VO sharing relationships re-
col definition specifies how distributed system elements
quire new technology. We structure our discussion of this
interact with one another to achieve a specified behavior
technology in terms of a Grid architecture that identifies
and the structure of the information exchanged during this
fundamental system components, specifies the purpose
interaction. This focus on externals (interactions) rather
and function of these components, and indicates how
than internals (software, resource characteristics) has im-
these components interact with one another.
portant pragmatic benefits. VOs tend to be fluid; hence,
In defining a Grid architecture, we start from the per-
the mechanisms used to discover resources, establish
spective that effective VO operation requires that we be
identity, determine authorization, and initiate sharing
able to establish sharing relationships among any poten-
must be flexible and lightweight, so that resource-sharing
tial participants. Interoperability is thus the central issue
arrangements can be established and changed quickly.
to be addressed. In a networked environment, inter-
Because VOs complement rather than replace existing in-
operability means common protocols. Hence, our Grid ar-
stitutions, sharing mechanisms cannot require substantial
chitecture is first and foremost a protocol architecture,
changes to local policies and must allow individual insti-
with protocols defining the basic mechanisms by which
tutions to maintain ultimate control over their own re-
VO users and resources negotiate, establish, manage, and
sources. Since protocols govern the interaction between
exploit sharing relationships. A standards-based open ar-
components and not the implementation of the compo-
chitecture facilitates extensibility, interoperability, porta-
nents, local control is preserved.
bility, and code sharing; standard protocols make it easy
Why are services important? A service (see the appen-
to define standard services that provide enhanced capabil-
dix) is defined solely by the protocol that it speaks and the
ities. We can also construct application programming in-
behaviors that it implements. The definition of standard
terfaces and software development kits (see the appendix
services—for access to computation, access to data, re-
for definitions) to provide the programming abstractions
source discovery, coscheduling, data replication, and so
required to create a usable Grid. Together, this technology
forth—allows us to enhance the services offered to VO
and architecture constitute what is often termed middle-
participants and also to abstract away resource-specific
ware (“the services needed to support a common set of ap-
details that would otherwise hinder the development of
plications in a distributed network environment”; Aiken
VO applications.
et al., 2000), although we avoid that term here due
Why do we also consider application programming in-
to its vagueness. We discuss each of these points in the
terfaces (APIs) and software development kits (SDKs)?
following.
There is, of course, more to VOs than interoperability,
Why is interoperability such a fundamental concern?
protocols, and services. Developers must be able to de-
At issue is our need to ensure that sharing relationships
velop sophisticated applications in complex and dynamic
can be initiated among arbitrary parties, accommodating
execution environments. Users must be able to operate
new participants dynamically across different platforms,
these applications. Application robustness, correctness,
languages, and programming environments. In this con-
development costs, and maintenance costs are all impor-
text, mechanisms serve little purpose if they are not de-
tant concerns. Standard abstractions, APIs, and SDKs can
fined and implemented so as to be interoperable across or-
accelerate code development, enable code sharing, and
ganizational boundaries, operational policies, and
enhance application portability. APIs and SDKs are an
resource types. Without interoperability, VO applications
adjunct to, not an alternative to, protocols. Without stan-
and participants are forced to enter into bilateral sharing
dard protocols, interoperability can be achieved at the
arrangements, as there is no assurance that the mecha-
API level only by using a single implementation every-
nisms used between any two parties will extend to any
where—infeasible in many interesting VOs—or by hav-
other parties. Without such assurance, dynamic VO for-
ing every implementation know the details of every other
mation is all but impossible, and the types of VOs that can
implementation. (The Jini approach [Arnold et al., 1999]
be formed are severely limited. Just as the Web revolu-
of downloading protocol code to a remote site does not
tionized information sharing by providing a universal
circumvent this requirement.)
protocol and syntax (HTTP and HTML) for information

ANATOMY OF THE GRID 205

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


In summary, our approach to Grid architecture empha- datagrid.org). More details will be provided in a subse-
sizes the identification and definition of protocols and ser- quent paper.
vices, first, and APIs and SDKs, second.
4.1 FABRIC: INTERFACES
TO LOCAL CONTROL
4 Grid Architecture Description
The Grid fabric layer provides the resources to which
Our goal in describing our Grid architecture is not to pro- shared access is mediated by Grid protocols: for example,
vide a complete enumeration of all required protocols computational resources, storage systems, catalogs, net-
(and services, APIs, and SDKs) but rather to identify re- work resources, and sensors. A “resource” may be a logi-
quirements for general classes of components. The result cal entity, such as a distributed file system, computer clus-
is an extensible, open architectural structure within which ter, or distributed computer pool; in such cases, a resource
can be placed solutions to key VO requirements. Our ar- implementation may involve internal protocols (e.g., the
chitecture and the subsequent discussion organize com- NSF storage access protocol or a cluster resource man-
ponents into layers, as shown in Figure 2. Components agement system’s process management protocol), but
within each layer share common characteristics but can these are not the concern of Grid architecture.
build on capabilities and behaviors provided by any lower Fabric components implement the local, resource-
layer. specific operations that occur on specific resources
In specifying the various layers of the Grid architec- (whether physical or logical) as a result of sharing opera-
ture, we follow the principles of the “hourglass model” tions at higher levels. There is thus a tight and subtle inter-
(Realizing the Information Future, 1994). The narrow dependence between the functions implemented at the
neck of the hourglass defines a small set of core abstrac- fabric level, on one hand, and the sharing operations sup-
tions and protocols (e.g., transmission control protocol ported, on the other. Richer fabric functionality enables
[TCP] and HTTP in the Internet), onto which many differ- more sophisticated sharing operations; at the same time,
ent high-level behaviors can be mapped (the top of the if we place few demands on fabric elements, then deploy-
hourglass) and which themselves can be mapped onto ment of Grid infrastructure is simplified. For example, re-
many different underlying technologies (the base of the source-level support for advance reservations makes it
hourglass). By definition, the number of protocols de- possible for higher level services to aggregate (co-
fined at the neck must be small. In our architecture, the schedule) resources in interesting ways that would other-
neck of the hourglass consists of resource and connectiv- wise be impossible to achieve. However, as in practice
ity protocols, which facilitate the sharing of individual re- few resources support advance reservation “out of the
sources. Protocols at these layers are designed so that they box,” a requirement for advance reservation increases the
can be implemented on top of a diverse range of resource cost of incorporating new resources into a Grid.
types, defined at the fabric layer, and can in turn be used to The issue/significance of building large, integrated
construct a wide range of global services and application- systems, just-in-time by aggregation (coscheduling and
specific behaviors at the collective layer—so called be- comanagement), is a significant new capability provided
cause they involve the coordinated (“collective”) use of by these Grid services.
multiple resources. Experience suggests that at a minimum, resources
Our architectural description is high level and places should implement enquiry mechanisms that permit dis-
few constraints on design and implementation. To make covery of their structure, state, and capabilities (e.g.,
this abstract discussion more concrete, we also list, for il- whether they support advance reservation), on one hand,
lustrative purposes, the protocols defined within the Glo- and resource management mechanisms that provide some
bus Toolkit (Foster and Kesselman, 1998b) and used control of delivered quality of service, on the other. The
within such Grid projects as the National Science Founda- following brief and partial list provides a resource-
tion’s (NSF’s) National Technology Grid (Stevens et al., specific characterization of capabilities.
1997), NASA’s Information Power Grid (Johnston,
Gannon, and Nitzberg, 1999), the Department of En- • Computational resources: Mechanisms are required
ergy’s (DOE’s) DISCOM (Beiriger et al., 2000), for starting programs and for monitoring and control-
G r i P hy N ( w w w. g r i p hy n . o rg ) , NEESgrid ling the execution of the resulting processes. Manage-
(www.neesgrid.org), Particle Physics Data Grid ment mechanisms that allow control over the re-
(www.ppdg.net), and the European Data Grid (www.eu- sources allocated to processes are useful, as are ad-

206 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Application

Internet Protocol Architecture


Grid Protocol Architecture

Application
Collective

Resource

Transport
Connectivity
Internet

Fabric Link

Fig. 2 The layered Grid architecture and its relationship to the Internet protocol (IP) architecture. Because the IP architecture ex-
tends from network to application, there is a mapping from Grid layers into Internet layers.

vance reservation mechanisms. Enquiry functions are


needed for determining hardware and software charac-
teristics as well as relevant state information such as
current load and queue state in the case of scheduler-
managed resources.
• Storage resources: Mechanisms are required for put-
ting and getting files. Third-party and high performance
(e.g., striped) transfers are useful (Tierney et al., 1996).
So are mechanisms for reading and writing subsets of a
file and/or executing remote data selection or reduction
functions (Beynon et al., 2000). Management mecha-
nisms that allow control over the resources allocated to
data transfers (space, disk bandwidth, network band-
width, CPU) are useful, as are advance reservation
mechanisms. Enquiry functions are needed for deter-
mining hardware and software characteristics as well as
relevant load information such as available space and
bandwidth utilization.
• Network resources: Management mechanisms that pro-
vide control over the resources allocated to network
transfers (e.g., prioritization, reservation) can be useful.
Enquiry functions should be provided to determine net-
work characteristics and load.
• Code repositories: This specialized form of storage re-
source requires mechanisms for managing versioned
source and object code: for example, a control system
such as CVS.

ANATOMY OF THE GRID 207

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


• Catalogs: This specialized form of storage resource With respect to security aspects of the connectivity
requires mechanisms for implementing catalog query layer, we observe that the complexity of the security prob-
and update operations: for example, a relational data- lem makes it important that any solutions be based on ex-
base (Baru et al., 1998). isting standards whenever possible. As with communica-
Globus Toolkit. The Globus Toolkit has been de- tion, many of the security standards developed within the
signed to use (primarily) existing fabric components, in- context of the IP suite are applicable.
cluding vendor-supplied protocols and interfaces. How- Authentication solutions for VO environments should
ever, if a vendor does not provide the necessary fabric- have the following characteristics (Butler et al., 2000):
level behavior, the Globus Toolkit includes the missing
functionality. For example, enquiry software is provided • Single sign-on. Users must be able to “log on” (authen-
for discovering structure and state information for various ticate) just once and then have access to multiple Grid
common resource types, such as computers (e.g., OS resources defined in the fabric layer, without further
version, hardware configuration, load [Dinda and user intervention.
O’Hallaron, 1999], scheduler queue status), storage sys- • Delegation (Foster et al., 1998; Gasser and McDer-
tems (e.g., available space), and networks (e.g., current mott, 1990; Howell and Kotz, 2000). A user must be
and predicted future load) (Lowekamp et al., 1998; able to endow a program with the ability to run on that
Wolski, 1997), and for packaging this information in a user’s behalf, so that the program is able to access the
form that facilitates the implementation of higher level resources on which the user is authorized. The pro-
protocols, specifically at the resource layer. Resource gram should (optionally) also be able to conditionally
management, on the other hand, is generally assumed to delegate a subset of its rights to another program
be the domain of local resource managers. One exception (sometimes referred to as restricted delegation).
is the General-Purpose Architecture for Reservation and • Integration with various local security solutions. Each
Allocation (GARA) (Foster, Roy, and Sander, 2000), site or resource provider may employ any of a variety
which provides a “slot manager” that can be used to im- of local security solutions, including Kerberos and
plement advance reservation for resources that do not sup- Unix security. Grid security solutions must be able to
port this capability. Others have developed enhancements interoperate with these various local solutions. They
to the Portable Batch System (PBS) (Papakhian, 1998) cannot, realistically, require wholesale replacement of
and Condor (Litzkow, Livny, and Mutka, 1988; Livny, local security solutions but rather must allow mapping
1998), which support advance reservation capabilities. into the local environment.
• User-based trust relationships. For a user to use re-
4.2 CONNECTIVITY: COMMUNICATING
sources from multiple providers together, the security
EASILY AND SECURELY
system must not require each of the resource providers
The connectivity layer defines core communication and to cooperate or interact with each other in configuring
authentication protocols required for Grid-specific net- the security environment. For example, if a user has
work transactions. Communication protocols enable the the right to use sites A and B, the user should be able to
exchange of data between fabric-layer resources. Authen- use sites A and B together without requiring that A’s
tication protocols build on communication services to and B’s security administrators interact.
provide cryptographically secure mechanisms for verify-
ing the identity of users and resources. Grid security solutions should also provide flexible sup-
Communication requirements include transport, rout- port for communication protection (e.g., control over the
ing, and naming. While alternatives certainly exist, we as- degree of protection, independent data unit protection for
sume here that these protocols are drawn from the TCP/ unreliable protocols, support for reliable transport proto-
Internet Protocol (IP) stack: specifically, the Internet (IP cols other than TCP) and enable stakeholder control over
and ICMP), transport (TCP, UDP), and application (DNS, authorization decisions, including the ability to restrict
OSPF, RSVP, etc.) layers of the Internet-layered protocol the delegation of rights in various ways.
architecture (Baker, 1995). This is not to say that in the fu- Globus Toolkit. The Internet protocols listed above
ture, Grid communications will not demand new proto- are used for communication. The public key–based Grid
cols that take into account particular types of network security infrastructure (GSI) protocols (Butler et al.,
dynamics. 2000; Foster et al., 1998) are used for authentication,

208 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


communication protection, and authorization. GSI builds While many such protocols can be imagined, the resource
on and extends the transport layer security (TLS) proto- (and connectivity) protocol layers form the neck of our
cols (Dierks and Allen, 1999) to address most of the is- hourglass model and as such should be limited to a small
sues listed above—in particular, single sign-on, delega- and focused set. These protocols must be chosen to cap-
tion, integration with various local security solutions (in- ture the fundamental mechanisms of sharing across many
cluding Kerberos) (Steiner, Neuman, and Schiller, 1988), different resource types (e.g., different local resource
and user-based trust relationships. X.509-format identity management systems) while not overly constraining the
certificates are used. Stakeholder control of authorization types or performance of higher level protocols that may
is supported via an authorization toolkit that allows re- be developed.
source owners to integrate local policies via a generic au- The list of desirable fabric functionality provided in
thorization and access (GAA) control interface. Rich sup- Section 4.1 summarizes the major features required in
port for restricted delegation is not provided in the current resource-layer protocols. To this list, we add the need for
toolkit release (v1.1.4) but has been demonstrated in “exactly once” semantics for many operations, with reli-
prototypes. able error reporting indicating when operations fail.
Globus Toolkit. A small and mostly standards-based
4.3 RESOURCE: SHARING set of protocols is adopted, particularly the following:
SINGLE RESOURCES

The resource layer builds on connectivity-layer commu- • A Grid resource information protocol (GRIP, currently
nication and authentication protocols to define protocols based on the lightweight directory access protocol
(and APIs and SDKs) for the secure negotiation, initia- [LDAP]) is used to define a standard resource informa-
tion, monitoring, control, accounting, and payment of tion protocol and associated information model. An
sharing operations on individual resources. Resource- associated soft-state resource registration protocol, the
layer implementations of these protocols call fabric-layer Grid resource registration protocol (GRRP), is used to
functions to access and control local resources. Resource- register resources with Grid index information servers,
layer protocols are concerned entirely with individual re- discussed in the next section (Czajkowski et al., 2001).
sources and hence ignore issues of global state and atomic • The HTTP-based Grid resource access and manage-
actions across distributed collections; such issues are the ment (GRAM) protocol is used for allocation of com-
concern of the collective layer, discussed next. putational resources and for monitoring and control of
Two primary classes of resource-layer protocols can computation on those resources (Czajkowski et al.,
be distinguished: 1998).
• An extended version of the file transfer protocol,
• Information protocols are used to obtain information GridFTP, is a management protocol for data access;
about the structure and state of a resource, for example, extensions include use of connectivity-layer security
its configuration, current load, and usage policy (e.g., protocols, partial file access, and management of par-
cost). allelism for high-speed transfers (Allcock et al., 2001).
• Management protocols are used to negotiate access to FTP is adopted as a base data transfer protocol because
a shared resource, specifying, for example, resource of its support for third-party transfers and because its
requirements (including advanced reservation and separate control and data channels facilitate the imple-
quality of service) and the operation(s) to be per- mentation of sophisticated servers.
formed, such as process creation, or data access. Since • LDAP is also used as a catalog access protocol.
management protocols are responsible for instant-
iating sharing relationships, they must serve as a “pol- The Globus Toolkit defines client-side C and Java APIs
icy application point,” ensuring that the requested pro- and SDKs for each of these protocols. Server-side SDKs
tocol operations are consistent with the policy under and servers are also provided for each protocol to facili-
which the resource is to be shared. Issues that must be tate the integration of various resources (computational,
considered include accounting and payment. A proto- storage, network) into the Grid. For example, the Grid re-
col may also support monitoring the status of an opera- source information service (GRIS) implements server-
tion and controlling (e.g., terminating) the operation. side LDAP functionality, with callouts allowing for publi-
cation of arbitrary resource information (Czajkowski
et al., 2001). An important server-side element of the

ANATOMY OF THE GRID 209

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


overall toolkit is the “gatekeeper,” which provides what is • Grid-enabled programming systems enable familiar
in essence a GSI-authenticated “inetd” that speaks the programming models to be used in Grid environments,
GRAM protocol and can be used to dispatch various local using various Grid services to address resource discov-
operations. The generic security services (GSS) API ery, security, resource allocation, and other concerns.
(Linn, 2000) is used to acquire, forward, and verify au- Examples include Grid-enabled implementations of
thentication credentials and to provide transport-layer in- the message-passing interface (Foster and Karonis,
tegrity and privacy within these SDKs and servers, en- 1998; Gabriel et al., 1998) and manager-worker frame-
abling substitution of alternative security services at the works (Casanova et al., 2000; Goux et al., 2000).
connectivity layer. • Workload management systems and collaboration
frameworks—also known as problem-solving envi-
4.4 COLLECTIVE: COORDINATING
ronments (PSEs)—provide for the description, use,
MULTIPLE RESOURCES
and management of multistep, asynchronous, multi-
While the resource layer is focused on interactions with a component workflows.
single resource, the next layer in the architecture contains • Software discovery services discover and select the
protocols and services (and APIs and SDKs) that are not best software implementation and execution platform
associated with any one specific resource but rather are based on the parameters of the problem being solved
global in nature and capture interactions across collec- (Casanova et al., 1998). Examples include NetSolve
tions of resources. For this reason, we refer to the next (Casanova and Dongarra, 1997) and Ninf (Nakada,
layer of the architecture as the collective layer. Because Sato, and Sekiguchi, 1999).
collective components build on the narrow resource- and • Community authorization servers enforce community
connectivity-layer “neck” in the protocol hourglass, they policies governing resource access, generating capa-
can implement a wide variety of sharing behaviors with- bilities that community members can use to access
out placing new requirements on the resources being community resources. These servers provide a global
shared. For example, policy enforcement service by building on resource in-
formation and resource management protocols (in the
• Directory services allow VO participants to discover resource layer) and security protocols in the connec-
the existence and/or properties of VO resources. A di- tivity layer. Akenti (Thompson et al., 1999) addresses
rectory service may allow its users to query for re- some of these issues.
sources by name and/or by attributes such as type, • Community accounting and payment services gather
availability, or load (Czajkowski et al., 2001). Re- resource usage information for the purpose of ac-
source-level GRRP and GRIP protocols are used to counting, payment, and/or limiting of resource usage
construct directories. by community members.
• Co-allocation, scheduling, and brokering services al- • Collaboratory services support the coordinated ex-
low VO participants to request the allocation of one or change of information within potentially large user
more resources for a specific purpose and the schedul- communities, whether synchronously or asynch-
ing of tasks on the appropriate resources. Examples in- ronously. Examples are CAVERNsoft (DeFanti and
clude AppLeS (Berman, 1998; Berman et al., 1996), Stevens, 1998; Leigh, Johnson, and DeFanti, 1997),
Condor-G (Frey et al., 2001), Nimrod-G (Abramson Access Grid (Childers et al., 2000), and commodity
et al., 1995), and the DRM broker (Beiriger et al., groupware systems.
2000).
• Monitoring and diagnostics services support the moni- These examples illustrate the wide variety of collective-
toring of VO resources for failure, adversarial attack layer protocols and services that are encountered in prac-
(“intrusion detection”), overload, and so forth. tice. Notice that while resource-layer protocols must be
• Data replication services support the management of general in nature and are widely deployed, collective-
VO storage (and perhaps also network and computing) layer protocols span the spectrum from general purpose
resources to maximize data access performance with to highly application or domain specific, with the latter
respect to metrics such as response time, reliability, existing perhaps only within specific VOs.
and cost (Allcock et al., 2001; Hoschek et al., 2000).

210 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Application
Co - reservation Service API & SDK
Co - reservation Protocol

Collective Layer
Co-reservation Service
Co-Allocation API & SDK
Resource Mgmt API & SDK
Resource Layer Resource Mgmt Protocol

Fabric Layer Network Network Compute



Resource Resource Resource

Fig. 3 Collective- and resource-layer protocols, services, application programming interfaces (APIs), and software develop-
ment kits (SDKs) can be combined in a variety of ways to deliver functionality to applications

Collective functions can be implemented as persistent


services, with associated protocols, or as SDKs (with asso-
ciated APIs) designed to be linked with applications. In
both cases, their implementation can build on resource-
layer (or other collective-layer) protocols and APIs. For
example, Figure 3 shows a collective co-allocation API
and SDK (the middle tier) that uses a resource-layer man-
agement protocol to manipulate underlying resources.
Above this, we define a coreservation service protocol and
implement a coreservation service that speaks this
protocol, calling the co-allocation API to implement co-
allocation operations and perhaps providing additional
functionality, such as authorization, fault tolerance, and
logging. An application might then use the coreservation ser-
vice protocol to request end-to-end network reservations.
Collective components may be tailored to the require-
ments of a specific user community, VO, or application
domain—for example, an SDK that implements an
application-specific coherency protocol or a coreservation
service for a specific set of network resources. Other col-
lective components can be more general purpose—for ex-
ample, a replication service that manages an international
collection of storage systems for multiple communities or
a directory service designed to enable the discovery of
VOs. In general, the larger the target user community, the
more important it is that a collective component’s proto-
col(s) and API(s) be standards based.

ANATOMY OF THE GRID 211

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Applications
Languages & Frameworks
Key:
API/SDK Collective APIs & SDKs
Collective Service Protocols

Service Collective Services

Resource APIs & SDKs


Resource Service Protocols
Resource Services

Connectivity APIs
Connectivity Protocols

Fabric

Fig. 4 Application programming interfaces (APIs) are implemented by software development kits (SDKs), which in turn use Grid
protocols to interact with network services that provide capabilities to the end user. Higher level SDKs can provide functionality
that is not directly mapped to a specific protocol but may combine protocol operations with calls to additional APIs as well as im-
plement local functionality. Solid lines represent a direct call; dashed lines represent protocol interactions.

Globus Toolkit. In addition to the example services


listed earlier in this section, many of which build on Glo-
bus connectivity and resource protocols, we mention the
Meta Directory Service (MDS). The MDS introduces Grid
information index servers (GIISs) to support arbitrary
views of resource subsets. The LDAP information protocol
is used to access resource-specific GRISs to obtain re-
source state and GRRP used for resource registration, as
well as replica catalog and replica management services
used to support the management of data set replicas in a
Grid environment (Allcock et al., 2001). An online creden-
tial repository service (“MyProxy”) provides secure stor-
age for proxy credentials (Novotny, Tuecke, and Welch,
2001). The DUROC co-allocation library provides an
SDK and API for resource co-allocation (Czajkowski,
Foster, and Kesselman, 1999).

4.5 APPLICATIONS

The final layer in our Grid architecture comprises the user


applications that operate within a VO environment. Figure 4
illustrates an application programmer’s view of Grid ar-
chitecture. Applications are constructed in terms of, and by
calling on, services defined at any layer. At each layer, we
have well-defined protocols that provide access to some
useful service: resource management, data access, re-

212 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Table 1
The Grid Services Used to Construct the Two Example Applications of Figure 1
Multidisciplinary Simulation Ray Tracing

Collective Solver coupler, distributed data archiver Checkpointing, job management, failover, staging
(application specific)
Collective (generic) Resource discovery, resource brokering, system monitoring, community authorization,
certificate revocation
Resource Access to computation; access to data; access to information about system structure, state,
performance
Connectivity Communication (Internet protocol), service discovery (DNS), authentication, authorization,
delegation
Fabric Storage systems, computers, networks, code repositories, catalogs

source discovery, and so forth. At each layer, APIs may


also be defined whose implementation (ideally provided
by third-party SDKs) exchange protocol messages with
the appropriate service(s) to perform desired actions.
We emphasize that what we label “applications” and
show in a single layer in Figure 4 may in practice call on so-
phisticated frameworks and libraries—for example, the
Common Component Architecture (Armstrong et al.,
1999), SciRun (Casanova et al., 1998), CORBA (Gannon
and Grimshaw, 1998; Lopez et al., 2000), Cactus (Benger
et al., 1999), and workflow systems (Bolcer and Kaiser,
1999)—and feature much internal structure that would, if
captured in our figure, expand it out to many times its cur-
rent size. These frameworks may themselves define proto-
cols, services, and/or APIs (e.g., the simple workflow ac-
cess protocol) (Bolcer and Kaiser, 1999). However, these
issues are beyond the scope of this article, which addresses
only the most fundamental protocols and services required
in a Grid.

5 Grid Architecture in Practice


We use two examples to illustrate how Grid architecture
functions in practice. Table 1 shows the services that might
be used to implement the multidisciplinary simulation and
cycle-sharing (ray-tracing) applications introduced in Fig-
ure 1. The basic fabric elements are the same in each case:
computers, storage systems, and networks. Furthermore,
each resource speaks standard connectivity protocols for
communication and security and resource protocols for en-
quiry, allocation, and management. Above this, each appli-

ANATOMY OF THE GRID 213

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


cation uses a mix of generic and more application-specific mation, these intergrid protocols (as we might call them)
collective services. enable different organizations to interoperate and ex-
In the case of the ray-tracing application, we assume change or share resources. Resources that speak these
that this is based on a high-throughput computing system protocols can be said to be “on the Grid.” Standard APIs
(Frey et al., 2001; Livny, 1998). To manage the execution are also highly useful if Grid code is to be shared. The
of large numbers of largely independent tasks in a VO en- identification of these intergrid protocols and APIs is be-
vironment, this system must keep track of the set of active yond the scope of this article, although the Globus Toolkit
and pending tasks, locate appropriate resources for each represents an approach that has had some success to date.
task, stage executables to those resources, detect and re-
spond to various types of failure, and so forth. An imple-
7 Relationships with
mentation in the context of our Grid architecture uses both
Other Technologies
domain-specific collective services (dynamic checkpoint,
task pool management, failover) and more generic collec- The concept of controlled, dynamic sharing within VOs is
tive services (brokering, data replication for executables, so fundamental that we might assume that Grid-like tech-
and common input files), as well as standard resource and nologies must surely already be widely deployed. In prac-
connectivity protocols. Condor-G represents a first step tice, however, while the need for these technologies is in-
toward this goal (Frey et al., 2001). deed widespread, in a wide variety of different areas, we
In the case of the multidisciplinary simulation applica- find only primitive and inadequate solutions to VO prob-
tion, the problems are quite different at the highest level. lems. In brief, current distributed computing approaches
Some application framework (e.g., CORBA, CCA) may do not provide a general resource-sharing framework that
be used to construct the application from its various com- addresses VO requirements. Grid technologies distin-
ponents. We also require mechanisms for discovering ap- guish themselves by providing this generic approach to
propriate computational resources, for reserving time on resource sharing. This situation points to numerous op-
those resources, staging executables (perhaps), providing portunities for the application of Grid technologies.
access to remote storage, and so forth. Again, a number of
domain-specific collective services will be used (e.g., 7.2 WORLD WIDE WEB
solver coupler, distributed data archiver), but the basic un- The ubiquity of Web technologies (i.e., IETF and W3C
derpinnings are the same as in the ray-tracing example. standard protocols—TCP/IP, HTTP, SOAP, etc.—and
languages, such as HTML and XML) makes them attrac-
6 “On the Grid”: The Need tive as a platform for constructing VO systems and appli-
for Intergrid Protocols cations. However, while these technologies do an excel-
lent job of supporting the browser-client to Web-server
Our Grid architecture establishes requirements for the interactions that are the foundation of today’s Web, they
protocols and APIs that enable sharing of resources, ser- lack features required for the richer interaction models
vices, and code. It does not otherwise constrain the tech- that occur in VOs. For example, today’s Web browsers
nologies that might be used to implement these protocols typically use transport layer security (TLS) for authenti-
and APIs. In fact, it is quite feasible to define multiple cation but do not support single sign-on or delegation.
instantiations of key Grid architecture elements. For ex- Clear steps can be taken to integrate Grid and Web
ample, we can construct both Kerberos- and PKI-based technologies. For example, the single sign-on capabilities
protocols at the connectivity layer—and access these se- provided in the GSI extensions to TLS would, if inte-
curity mechanisms via the same API, thanks to GSS-API grated into Web browsers, allow for single sign-on to
(see the appendix). However, Grids constructed with multiple Web servers. GSI delegation capabilities would
these different protocols are not interoperable and cannot permit a browser client to delegate capabilities to a Web
share essential services—at least not without gateways. server so that the server could act on the client’s behalf.
For this reason, the long-term success of Grid computing These capabilities, in turn, make it much easier to use
requires that we select and achieve widespread deploy- Web technologies to build “VO portals” that provide thin
ment of one set of protocols at the connectivity and re- client interfaces to sophisticated VO applications.
source layers—and, to a lesser extent, at the collective
layer. Much as the core Internet protocols enable different
computer networks to interoperate and exchange infor-

214 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


WebOS addresses some of these issues (Vahdat et al.,
1998).

7.2 APPLICATION AND


STORAGE SERVICE PROVIDERS

Application service providers, storage service providers,


and similar hosting companies typically offer to outsource
specific business and engineering applications (in the case
of ASPs) and storage capabilities (in the case of SSPs). A
customer negotiates a service-level agreement that defines “In brief, current distributed computing
access to a specific combination of hardware and software. approaches do not provide a general
Security tends to be handled by using VPN technology to
resource-sharing framework that
extend the customer’s intranet to encompass resources op-
erated by the ASP or SSP on the customer’s behalf. Other addresses VO requirements. Grid
SSPs offer file-sharing services, in which case access is technologies distinguish themselves by
provided via HTTP, FTP, or WebDAV with user IDs, pass- providing this generic approach to
words, and access control lists controlling access. resource sharing.”
From a VO perspective, these are low-level building-
block technologies. VPNs and static configurations make
many VO sharing modalities hard to achieve. For example,
the use of VPNs means that it is typically impossible for an
ASP application to access data located on storage managed
by a separate SSP. Similarly, dynamic reconfiguration of
resources within a single ASP or SPP is challenging and, in
fact, is rarely attempted. The load sharing across providers
that occurs on a routine basis in the electric power industry
is unheard of in the hosting industry. A basic problem is
that a VPN is not a VO: it cannot extend dynamically to en-
compass other resources and does not provide the remote
resource provider with any control of when and whether to
share its resources.
The integration of Grid technologies into ASPs and
SSPs can enable a much richer range of possibilities. For
example, standard Grid services and protocols can be used
to achieve a decoupling of the hardware and software. A
customer could negotiate an SLA for particular hardware
resources and then use Grid resource protocols to dynami-
cally provision that hardware to run customer-specific ap-
plications. Flexible delegation and access control mecha-
nisms would allow a customer to grant an application
running on an ASP computer direct, efficient, and secure
access to data on SSP storage—and/or to couple resources
from multiple ASPs and SSPs with their own resources
when required for more complex problems. A single sign-
on security infrastructure able to span multiple security do-
mains dynamically is, realistically, required to support
such scenarios. Grid resource management and accounting/
payment protocols that allow for dynamic provisioning
and reservation of capabilities (e.g., amount of storage,
transfer bandwidth, etc.) are also critical.

ANATOMY OF THE GRID 215

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


7.3 ENTERPRISE COMPUTING SYSTEMS grated (“stovepipe”) solutions, rather than seeking to
define common protocols that would allow for shared in-
Enterprise development technologies such as CORBA,
frastructure and interoperability. (This is, of course, a
Enterprise Java Beans, Java 2 Enterprise Edition, and
common characteristic of new market niches, in which
DCOM are all systems designed to enable the construc-
participants still hope for a monopoly.) Another is that the
tion of distributed applications. They provide standard re-
forms of sharing targeted by various applications are
source interfaces, remote invocation mechanisms, and
quite limited—for example, file sharing with no access
trading services for discovery and hence make it easy to
control and computational sharing with a centralized
share resources within a single organization. However,
server.
these mechanisms address none of the specific VO re-
As these applications become more sophisticated and
quirements listed above. Sharing arrangements are typi-
the need for interoperability becomes clearer, we will see
cally relatively static and restricted to occur within a sin-
a strong convergence of interests between peer-to-peer,
gle organization. The primary form of interaction is
Internet, and Grid computing (Foster, 2000). For exam-
client-server, rather than the coordinated use of multiple
ple, single sign-on, delegation, and authorization technol-
resources.
ogies become important when computational and data-
These observations suggest that there should be a role
sharing services must interoperate and the policies
for Grid technologies within enterprise computing. For
that govern access to individual resources become more
example, in the case of CORBA, we could construct an
complex.
object request broker (ORB) that uses GSI mechanisms to
address cross-organizational security issues. We could
implement a portable object adapter that speaks the Grid 8 Other Perspectives on Grids
resource management protocol to access resources spread
The perspective on Grids and VOs presented in this article
across a VO. We could construct Grid-enabled naming
is of course not the only view that can be taken. We sum-
and trading services that use Grid information service
marize here—and critique—some alternative perspec-
protocols to query information sources distributed across
tives (given in italics).
large VOs. In each case, the use of Grid protocols provides
The Grid is a next-generation Internet. “The Grid” is
enhanced capability (e.g., interdomain security) and en-
not an alternative to “the Internet”: it is rather a set of ad-
ables interoperability with other (non-CORBA) clients.
ditional protocols and services that build on Internet pro-
Similar observations can be made about Java and Jini. For
tocols and services to support the creation and use of com-
example, Jini’s protocols and implementation are geared
putation- and data-enriched environments. Any resource
toward a small collection of devices. A “Grid Jini” that
that is “on the Grid” is also, by definition, “on the Net.”
employed Grid protocols and services would allow the
The Grid is a source of free cycles. Grid computing
use of Jini abstractions in a large-scale, multi-enterprise
does not imply unrestricted access to resources. Grid
environment.
computing is about controlled sharing. Resource owners
7.4 INTERNET AND will typically want to enforce policies that constrain ac-
PEER-TO-PEER COMPUTING cess according to group membership, ability to pay, and
so forth. Hence, accounting is important, and a Grid ar-
Peer-to-peer computing (as implemented, for example, in chitecture must incorporate resource and collective proto-
the Napster, Gnutella, and Freenet [Clarke et al., 1999] cols for exchanging usage and cost information, as well as
file-sharing systems) and Internet computing (as imple- for exploiting this information when deciding whether to
mented, for example, by the SETI@home, Parabon, and enable sharing.
Entropia systems) are an example of the more general The Grid requires a distributed operating system. In
(“beyond client-server”) sharing modalities and compu- this view (e.g., see Grimshaw and Wulf, 1996), Grid soft-
tational structures that we referred to in our characteriza- ware should define the operating system services to be in-
tion of VOs. As such, they have much in common with stalled on every participating system, with these services
Grid technologies. providing for the Grid what an operating system provides
In practice, we find that the technical focus of work in for a single computer—namely, transparency with re-
these domains has not overlapped significantly to date. spect to location, naming, security, and so forth. Put an-
One reason is that peer-to-peer and Internet computing other way, this perspective views the role of Grid software
developers have so far focused entirely on vertically inte- as defining a virtual machine. However, we feel that this

216 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


perspective is inconsistent with our primary goals of broad
deployment and interoperability. We argue that the appro-
priate model is rather the Internet protocol suite, which
provides largely orthogonal services that address the
unique concerns that arise in networked environments. The “‘The Grid’ is not an alternative to ‘the
tremendous physical and administrative heterogeneities Internet’: it is rather a set of additional
encountered in Grid environments mean that the tradi-
protocols and services that build on
tional transparencies are unobtainable; on the other hand, it
does appear feasible to obtain agreement on standard pro- Internet protocols and services to support
tocols. The architecture proposed here is deliberately open the creation and use of computation- and
rather than prescriptive: it defines a compact and minimal data-enriched environments.”
set of protocols that a resource must speak to be “on the
Grid”; beyond this, it seeks only to provide a framework
within which many behaviors can be specified.
The Grid requires new programming models. Pro-
gramming in Grid environments introduces challenges
that are not encountered in sequential (or parallel) comput-
ers, such as multiple administrative domains, new failure
modes, and large variations in performance. However, we
argue that these are incidental, not central, issues and that
the basic programming problem is not fundamentally dif-
ferent. As in other contexts, abstraction and encapsulation
can reduce complexity and improve reliability. But, as in
other contexts, it is desirable to allow a wide variety of
higher level abstractions to be constructed, rather than en-
forcing a particular approach. So, for example, a developer
who believes that a universal distributed shared-memory
model can simplify Grid application development should
implement this model in terms of Grid protocols, extend-
ing or replacing those protocols only if they prove inade-
quate for this purpose. Similarly, a developer who believes
that all Grid resources should be presented to users as ob-
jects needs simply to implement an object-oriented API in
terms of Grid protocols.
The Grid makes high performance computers superflu-
ous. The hundreds, thousands, or even millions of proces-
sors that may be accessible within a VO represent a signifi-
cant source of computational power, if they can be
harnessed in a useful fashion. This does not imply, how-
ever, that traditional high performance computers are ob-
solete. Many problems require tightly coupled computers,
with low latencies and high communication bandwidths;
Grid computing may well increase, rather than reduce, de-
mand for such systems by making access easier.

9 Conclusions
We have provided in this article a concise statement of the
“Grid problem,” which we define as controlled and coordi-
nated resource sharing and resource use in dynamic, scal-

ANATOMY OF THE GRID 217

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


able virtual organizations. We have also presented both fundamental to achieving interoperability in a distributed com-
requirements and a framework for a Grid architecture, puting environment.
identifying the principal functions required to enable A protocol definition also says little about the behavior of an
sharing within VOs and defining key relationships among entity that speaks the protocol. For example, the FTP protocol
definition indicates the format of the messages used to negotiate
these different functions. Finally, we have discussed in
a file transfer but does not make clear how the receiving entity
some detail how Grid technologies relate to other impor- should manage its files.
tant technologies. As the above examples indicate, protocols may be defined in
We hope that the vocabulary and structure introduced terms of other protocols.
in this document will prove useful to the emerging Grid Service. A service is a network-enabled entity that provides
community, by improving understanding of our problem a specific capability—for example, the ability to move files, cre-
and providing a common language for describing solu- ate processes, or verify access rights. A service is defined in
tions. We also hope that our analysis will help establish terms of the protocol one uses to interact with it and the behavior
connections among Grid developers and proponents of re- expected in response to various protocol message exchanges
lated technologies. (i.e., “service = protocol + behavior”). A service definition may
The discussion in this paper also raises a number of im- permit a variety of implementations—for example, the following:
portant questions. What are appropriate choices for the
intergrid protocols that will enable interoperability • An FTP server speaks the file transfer protocol and supports
remote read-and-write access to a collection of files. One
among Grid systems? What services should be present in FTP server implementation may simply write to and read
a persistent fashion (rather than being duplicated by each from the server’s local disk, while another may write to and
application) to create usable Grids? And what are the key read from a mass storage system, automatically compressing
APIs and SDKs that must be delivered to users to acceler- and uncompressing files in the process. From a fabric-level
ate development and deployment of Grid applications? perspective, the behaviors of these two servers in response to
We have our own opinions on these questions, but the an- a store request (or retrieve request) are very different. From
swers clearly require further research. the perspective of a client of this service, however, the behav-
iors are indistinguishable; storing a file and then retrieving
the same file will yield the same results regardless of which
APPENDIX
server implementation is used.
Definitions
• An LDAP server speaks the LDAP protocol and supports re-
sponse to queries. One LDAP server implementation may re-
We define here four terms that are fundamental to the dis- spond to queries using a database of information, while an-
cussion in this article but are frequently misunderstood and other may respond to queries by dynamically making SNMP
misused—namely, protocol, service, SDK, and API. calls to generate the necessary information on the fly.
Protocol. A protocol is a set of rules that endpoints in a tele-
communication system use when exchanging information—for A service may or may not be persistent (i.e., always available),
example, the following: be able to detect and/or recover from certain errors, run with
privileges, and/or have a distributed implementation for en-
• The Internet protocol (IP) defines an unreliable packet trans- hanced scalability. If variants are possible, then discovery mech-
fer protocol. anisms that allow a client to determine the properties of a partic-
• The transmission control protocol (TCP) builds on IP to de- ular instantiation of a service are important.
fine a reliable data delivery protocol. Note also that one can define different services that speak the
same protocol. For example, in the Globus Toolkit, both the rep-
• The transport layer security (TLS) protocol (Dierks and Al-
lica catalog (Allcock et al., 2001) and information service
len, 1999) defines a protocol to provide privacy and data in-
(Czajkowski et al., 2001) use LDAP.
tegrity between two communicating applications. It is lay-
API. An application program interface (API) defines a stan-
ered on top of a reliable transport protocol such as TCP.
dard interface (e.g., set of subroutine calls or objects and method
• The lightweight directory access protocol (LDAP) builds on invocations in the case of an object-oriented API) for invoking a
TCP to define a query-response protocol for querying the
specified set of functionality—for example, the following:
state of a remote database.

An important property of protocols is that they admit to multiple • The generic security service (GSS) API (Linn, 2000) defines
implementations: two endpoints need only implement the same standard functions for verifying the identity of communicat-
protocol to be able to communicate. Standard protocols are thus ing parties, encrypting messages, and so forth.

218 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


• The message-passing interface API (Gropp, Lusk, and
Skjellum, 1994) defines standard interfaces, in several lan-
guages, to functions used to transfer data among processes in
a parallel computing system.

An API may define multiple language bindings or use an


GSS-API
interface definition language. The language may be a conven-
GSP
tional programming language such as C or Java, or it may be a Kerberos PKI
shell interface. In the latter case, the API refers to a particular def- Kerberos or PKI
inition of command-line arguments to the program, the input and
output of the program, and the exit status of the program. An API Domain A Domain B
normally will specify a standard behavior but can admit to multi-
ple implementations.
It is important to understand the relationship between APIs Fig. 5 On the left, an application programming interface
and protocols. A protocol definition says nothing about the APIs (API) is used to develop applications that can target either
that might be called from within a program to generate protocol Kerberos or PKI security mechanisms. On the right, proto-
messages. A single protocol may have many APIs; a single API cols (the Grid security protocols provided by the Globus
Toolkit) are used to enable interoperability between
may have multiple implementations that target different proto-
Kerberos and PKI domains.
cols. In brief, standard APIs enable portability; standard proto-
cols enable interoperability. For example, both public key and
Kerberos bindings have been defined for the GSS-API (Linn,
2000). Hence, a program that uses GSS-API calls for authentica-
tion operations can operate in either a public key or a Kerberos en-
vironment without change. On the other hand, if we want a pro-
gram to operate in a public key and a Kerberos environment at the
same time, then we need a standard protocol that supports
interoperability of these two environments (see Figure 5).
SDK. The term software development kit (SDK) denotes a set
of code designed to be linked with and invoked from within an ap-
plication program to provide specified functionality. An SDK
typically implements an API. If an API admits to multiple imple-
mentations, then there will be multiple SDKs for that API. Some
SDKs provide access to services via a particular protocol—for
example, the following:

• The OpenLDAP release includes an LDAP client SDK, which


contains a library of functions that can be used from a C or
C++ application to perform queries to an LDAP service.
• JNDI is a Java SDK, which contains functions that can be used
to perform queries to an LDAP service.
• Different SDKs implement GSS-API using the TLS and
Kerberos protocols, respectively.

There may be multiple SDKs, for example, from multiple ven-


dors, which implement a particular protocol. Furthermore, for
client-server-oriented protocols, there may be separate client
SDKs for use by applications that want to access a service and
server SDKs for use by service implementers that want to imple-
ment particular, customized service behaviors.
An SDK need not speak any protocol. For example, an SDK
that provides numerical functions may act entirely locally and not
need to speak to any services to perform its operations.

ANATOMY OF THE GRID 219

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


ACKNOWLEDGMENTS leadership role in many of Argonne’s research and development
projects in the area of high performance, distributed, and Grid
We are grateful to numerous colleagues for discussions on the computing and directs the efforts of both Argonne staff and col-
topics covered here—particularly Bill Allcock, Randy Butler, laborators in the implementation of the Globus Toolkit. He is
Ann Chervenak, Karl Czajkowski, Steve Fitzgerald, Bill also the cochair of the Grid Forum Security Working Group. He
Johnston, Miron Livny, Joe Mambretti, Reagan Moore, Harvey received a B.A. in mathematics and computer science from St.
Newman, Laura Pearlman, Rick Stevens, Gregor von Olaf College.
Laszewski, Rich Wellner, and Mike Wilde—and participants in
the “Clusters and Computational Grids for Scientific Com-
REFERENCES
puting” workshop (Lyon, September 2000) and the 4th Grid Fo-
rum meeting (Boston, October 2000), at which early versions of Abramson, D., Sosic, R., Giddy, J., and Hall, B. 1995. Nimrod:
these ideas were presented. A tool for performing parameterized simulations using dis-
This work was supported in part by the Mathematical, Infor- tributed workstations. In Proc. 4th IEEE Symp. on High Per-
mation, and Computational Sciences Division subprogram of formance Distributed Computing.
the Office of Advanced Scientific Computing Research, U.S. Aiken, R., Carey, M., Carpenter, B., Foster, I., Lynch, C.,
Department of Energy, under contract W-31-109-Eng-38; by the Mambretti, J., Moore, R., Strasnner, J., and Teitelbaum, B.
Defense Advanced Research Projects Agency under contract 2000. Network policy and services: A report of a workshop
N66001-96-C-8523; by the National Science Foundation; and on middleware [Online]. Available: http://www.ietf.org/rfc/
by the NASA Information Power Grid program. rfc2768.txt.
Allcock, B., Bester, J., Bresnahan, J., Chervenak, A. L., Foster,
BIOGRAPHIES I., Kesselman, C., Meder, S., Nefedova, V., Quesnel, D., and
Tuecke, S. 2001. Secure, efficient data transport and replica
Ian Foster is a senior scientist and associate director of the management for high-performance data-intensive comput-
Mathematics and Computer Science Division at Argonne Na- ing. In Mass Storage Conference.
tional Laboratory, professor of computer science at the Univer- Armstrong, R., Gannon, D., Geist, A., Keahey, K., Kohn, S.,
sity of Chicago, and senior fellow in the Argonne/University of McInnes, L., and Parker, S. 1999. Toward a common compo-
Chicago Computation Institute. He has published four books nent architecture for high performance scientific computing.
and more than 100 papers and technical reports in parallel and In Proc. 8th IEEE Symp. on High Performance Distributed
distributed processing, software engineering, and computa- Computing.
tional science. He currently coleads the Globus project with Arnold, K., O’Sullivan, B., Scheifler, R. W., Waldo, J., and
Dr. Carl Kesselman of the University of Southern California/ Wollrath, A. 1999. The Jini Specification. Boston: Addison-
Information Sciences Institute, which was awarded the 1997 Wesley.
Global Information Infrastructure “Next Generation” award and Baker, F. 1995. Requirements for IP Version 4 Routers, IETF,
provides protocols and services used by many distributed com- RFC 1812 [Online]. Available: http://www.ietf.org/rfc/
puting projects worldwide. He cofounded the influential Global rfc1812.txt.
Grid Forum and recently coedited a book on this topic, titled The Barry, J., Aparicio, M., Durniak, T., Herman, P., Karuturi, J.,
Grid: Blueprint for a New Computing Infrastructure. Woods, C., Gilman, C., Ramnath, R., and Lam, H. 1998.
NIIIP-SMART: An investigation of distributed object ap-
Carl Kesselman is a senior project leader at the Information
proaches to support MES development and deployment in a
Sciences Institute and a research associate professor of com-
virtual enterprise. In 2nd International Enterprise Distrib-
puter science, both at the University of Southern California. He
uted Computing Workshop.
is also a visiting associate in computer science at the California
Baru, C., Moore, R., Rajasekar, A., and Wan, M. 1998. The
Institute of Technology. He received a Ph.D. in computer sci-
SDSC storage resource broker. In Proc. CASCON ’98 Con-
ence from the University of California, Los Angeles in 1991. He ference.
currently coleads the Globus project with Dr. Ian Foster of Beiriger, J., Johnson, W., Bivens, H., Humphreys, S., and Rhea,
Argonne National Laboratory and the University of Chicago, R. 2000. Constructing the ASCI grid. In Proc. 9th IEEE Sym-
which was awarded the 1997 Global Information Infrastructure posium on High Performance Distributed Computing.
“Next Generation” award and provides protocols and services Benger, W., Foster, I., Novotny, J., Seidel, E., Shalf, J., Smith,
used by many distributed computing projects worldwide. He re- W., and Walker, P. 1999. Numerical relativity in a distributed
cently coedited a book on this topic, titled The Grid: Blueprint environment. In Proc. 9th SIAM Conference on Parallel Pro-
for a New Computing Infrastructure. cessing for Scientific Computing.
Steven Tuecke is a software architect in the Distributed Sys- Berman, F. 1998. High-performance schedulers. In The Grid:
tems Laboratory in the Mathematics and Computer Science Di- Blueprint for a New Computing Infrastructure, edited by I.
vision at Argonne National Laboratory and a fellow with the Foster and C. Kesselman, 279-309. San Francisco: Morgan
Computation Institute at the University of Chicago. He plays a Kaufmann.

220 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Berman, F., Wolski, R., Figueira, S., Schopf, J., and Shao, G. shop on Job Scheduling Strategies for Parallel Processing,
1996. Application-level scheduling on distributed heteroge- pp. 62-82.
neous networks. In Proc. Supercomputing ’96. Czajkowski, K., Foster, I., and Kesselman, C. 1999. Co-allocation
Beynon, M., Ferreira, R., Kurc, T., Sussman, A., and Saltz, J. services for computational grids. In Proc. 8th IEEE Sympo-
2000. DataCutter: Middleware for filtering very large scien- sium on High Performance Distributed Computing.
tific datasets on archival storage systems. In Proc. 8th DeFanti, T., and Stevens, R. 1998. Teleimmersion. In The Grid:
Goddard Conference on Mass Storage Systems and Technol- Blueprint for a New Computing Infrastructure, edited by I.
ogies/17th IEEE Symposium on Mass Storage Systems, Foster and C. Kesselman, 131-155. San Francisco: Morgan
pp. 119-133. Kaufmann.
Bolcer, G. A., and Kaiser, G. 1999. SWAP: Leveraging the Web Dierks, T., and Allen, C. 1999. The TLS Protocol Version 1.0,
to manage workflow. IEEE Internet Computing 3 (1):85-88. IETF, RFC 2246 [Online]. Available: http://www.ietf.org/
Brunett, S., Czajkowski, K., Fitzgerald, S., Foster, I., Johnson, rfc/rfc2246.txt.
A., Kesselman, C., Leigh, J., and Tuecke, S. 1998. Applica- Dinda, P., and O’Hallaron, D. 1999. An evaluation of linear
tion experiences with the Globus Toolkit. In Proc. 7th IEEE models for host load prediction. In Proc. 8th IEEE Sympo-
Symp. on High Performance Distributed Computing, sium on High-Performance Distributed Computing.
pp. 81-89. Foster, I. 2000. Internet computing and the emerging Grid. In
Butler, R., Engert, D., Foster, I., Kesselman, C., Tuecke, S., Nature Web Matters [Online]. Available: http://www.
Volmer, J., and Welch, V. 2000. Design and deployment of a nature.com/nature/webmatters/grid/grid.html.
national-scale authentication infrastructure. IEEE Computer Foster, I., and Karonis, N. 1998. A Grid-enabled MPI: Message
33 (12): 60-66. passing in heterogeneous distributed computing systems. In
Camarinha-Matos, L. M., Afsarmanesh, H., Garita, C., and Proc. SC ’98.
Lima, C. 1998. Towards an architecture for virtual enter- Foster, I., and Kesselman, C., eds. 1998a. The Grid: Blueprint
prises. J. Intelligent Manufacturing 9 (2): 189-199. for a New Computing Infrastructure. San Francisco: Morgan
Casanova, H., and Dongarra, J. 1997. NetSolve: A network Kaufmann.
server for solving computational science problems. Interna- Foster, I., and Kesselman, C. 1998b. The Globus Project: A sta-
tional Journal of Supercomputer Applications and High Per- tus report. In Proc. Heterogeneous Computing Workshop,
formance Computing 11 (3): 212-223. pp. 4-18.
Casanova, H., Dongarra, J., Johnson, C., and Miller, M. 1998. Foster, I., Kesselman, C., Tsudik, G., and Tuecke, S. 1998. A se-
Application-specific tools. In The Grid: Blueprint for a New curity architecture for computational grids. In ACM Confer-
Computing Infrastructure, edited by I. Foster and C. ence on Computers and Security, pp. 83-91.
Kesselman, 159-80. San Francisco: Morgan Kaufmann. Foster, I., Roy, A., and Sander, V. 2000. A quality of service ar-
Casanova, H., Obertelli, G., Berman, F., and Wolski, R. 2000. chitecture that combines resource reservation and applica-
The AppLeS Parameter Sweep Template: User-level tion adaptation. In Proc. 8th International Workshop on
middleware for the Grid. In Proc. SC ’2000. Quality of Service.
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., and Frey, J., Foster, I., Livny, M., Tannenbaum, T., and Tuecke, S.
Tuecke, S. 2001. The Data Grid: Towards an architecture for 2001. Condor-G: A Computation Management Agent for
the distributed management and analysis of large scientific Multi-Institutional Grids. Madison: University of
data sets. J. Network and Computer Applications 23:187- Wisconsin–Madison.
200. Gabriel, E., Resch, M., Beisel, T., and Keller, R. 1998. Distrib-
Childers, L., Disz, T., Olson, R., Papka, M. E., Stevens, R. and uted computing in a heterogeneous computing environment.
Udeshi, T. 2000. Access Grid: Immersive group-to-group In Proc. EuroPVM/MPI ’98.
collaborative visualization. In Proc. 4th International Gannon, D., and Grimshaw, A. 1998. Object-based approaches.
Immersive Projection Technology Workshop. In The Grid: Blueprint for a New Computing Infrastructure,
Clarke, I., Sandberg, O., Wiley, B., and Hong, T. W. 1999. edited by I. Foster and C. Kesselman, 205-236. San Fran-
Freenet: A distributed anonymous information storage and cisco: Morgan Kaufmann.
retrieval system. In ICSI Workshop on Design Issues in Ano- Gasser, M., and McDermott, E. 1990. An architecture for practi-
nymity and Unobservability. cal delegation in a distributed system. In Proc. 1990 IEEE
Czajkowski, K., Fitzgerald, S., Foster, I., and Kesselman, C. Symposium on Research in Security and Privacy, pp. 20-30.
2001. Grid information services for distributed resource Goux, J. -P., Kulkarni, S., Linderoth, J., and Yoder, M. 2000. An
sharing. In Proceedings of the 10th IEEE International Sym- enabling framework for master-worker applications on the
posium on High-Performance Distributed Computing, computational grid. In Proc. 9th IEEE Symp. on High Perfor-
pp. 181-184. mance Distributed Computing.
Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, Grimshaw, A., and Wulf, W. 1996. Legion: A view from 50,000
S., Smith, W., and Tuecke, S. 1998. A resource management feet. In Proc. 5th IEEE Symposium on High Performance
architecture for metacomputing systems. In The 4th Work- Distributed Computing, pp. 89-99.

ANATOMY OF THE GRID 221

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


Gropp, W., Lusk, E., and Skjellum, A. 1994. Using MPI: Porta- a New Computing Infrastructure, edited by I. Foster and
ble Parallel Programming with the Message Passing Inter- C. Kesselman, 105-129. San Francisco: Morgan Kaufmann.
face. Cambridge, MA: MIT Press. Nakada, H., Sato, M., and Sekiguchi, S. 1999. Design and im-
Hoschek, W., Jaen-Martinez, J., Samar, A., Stockinger, H., and plementations of Ninf: Towards a global computing infra-
Stockinger, K. 2000. Data management in an international structure. Future Generation Computing Systems 15 (5-6):
data Grid project. In Proc. 1st IEEE/ACM International 649-658.
Workshop on Grid Computing. Novotny, J., Tuecke, S., and Welch, V. 2001. Initial experiences
Howell, J., and Kotz, D. 2000. End-to-end authorization. In with an online certificate repository for the Grid: MyProxy.
Proc. 2000 Symposium on Operating Systems Design and In Proceedings of the 10th IEEE International Symposium
Implementation. on High-Performance Distributed Computing, pp. 104-111.
Johnston, W. E., Gannon, D., and Nitzberg, B. 1999. Grids as Papakhian, M. 1998. Comparing job-management systems: The
production computing environments: The engineering as- user’s perspective. IEEE Computational Science & Engi-
pects of NASA’s information power grid. In Proc. 8th IEEE neering, April-June [Online]. Available: http://pbs.mrj.com.
Symposium on High Performance Distributed Computing. Realizing the Information Future: The Internet and Beyond.
Leigh, J., Johnson, A., and DeFanti, T. A. 1997. CAVERN: A 1994. Washington, D.C.: National Academy Press.
distributed architecture for supporting scalable persistence Sculley, A., and Woods, W. 2000. B2B exchanges: The Killer
and interoperability in collaborative virtual environments. Application in the Business-to-Business Internet Revolution.
Virtual Reality: Research, Development and Applications 2 New York: ISI Publications.
(2): 217-237. Steiner, J., Neuman, B. C., and Schiller, J. 1988. Kerberos: An
Linn, J. 2000. Generic Security Service Application Program In- authentication system for open network systems. In Proc.
terface Version 2, Update 1, IETF, RFC 2743 [Online]. Usenix Conference, pp. 191-202.
Available: http://www.ietf.org/rfc/rfc2743. Stevens, R., Woodward, P., DeFanti, T., and Catlett, C. 1997.
Litzkow, M., Livny, M., and Mutka, M. 1988. Condor: A hunter From the I-WAY to the national technology Grid. Communi-
of idle workstations. In Proc. 8th Intl Conf. on Distributed cations of the ACM 40 (11): 50-61.
Computing Systems, pp. 104-111. Thompson, M., Johnston, W., Mudumbai, S., Hoo, G., Jackson,
Livny, M. 1998. High-throughput resource management. In The K., and Essiari, A. 1999. Certificate-based access control for
Grid: Blueprint for a New Computing Infrastructure, edited widely distributed resources. In Proc. 8th Usenix Security
by I. Foster and C. Kesselman, 311-337. San Francisco: Mor- Symposium.
gan Kaufmann. Tierney, B., Johnston, W., Lee, J., and Hoo, G. 1996. Perfor-
Lopez, I., Follen, G., Gutierrez, R., Foster, I., Ginsburg, B., mance analysis in high-speed wide area IP over ATM net-
Larsson, O., Martin, S., and Tuecke, S. 2000. NPSS on works: Top-to-bottom end-to-end monitoring. IEEE Net-
NASA’s IPG: Using CORBA and Globus to coordinate work 10 (3).
multidisciplinary aeroscience applications. In Proc. NASA Vahdat, A., Belani, E., Eastham, P., Yoshikawa, C., Anderson,
HPCC/CAS Workshop. T., Culler, D., and Dahlin, M. 1998. WebOS: Operating sys-
Lowekamp, B., Miller, N., Sutherland, D., Gross, T., Steenkiste, P., tem services for wide area applications. In 7th Symposium on
and Subhlok, J. 1998. A resource query interface for High Performance Distributed Computing.
network-aware applications. In Proc. 7th IEEE Symposium Wolski, R. 1997. Forecasting network performance to support
on High-Performance Distributed Computing. dynamic scheduling using the network weather service. In
Moore, R., Baru, C., Marciano, R., Rajasekar, A., and Wan, M. Proc. 6th IEEE Symp. on High Performance Distributed
1998. Data-intensive computing. In The Grid: Blueprint for Computing, Portland, OR.

222 COMPUTING APPLICATIONS

Downloaded from hpc.sagepub.com at UNIV OF CHICAGO LIBRARY on April 2, 2013


View publication stats

You might also like