You are on page 1of 41

An Introduction to

Grid Computing
BEAM Workshop
December 2004

Mark Servilla
servilla@LTERnet.edu
LTER Network Office
Presentation Agenda
 Definitions
 Evolution of the Grid
 Characteristics
 Computing Model
 Protocols
 Examples
 References

SEEK-BEAM Workshop Dec 2004 2


Definitions of a Grid
 “… a network of conductors for distribution of electric
power; also : a network of radio or television stations” –
Merriam-Webster

 “… the illusion of a simple yet large and powerful self-


managing virtual computer out of a large collection of
connected heterogeneous systems sharing various
combinations of resources” – IBM Redbooks

 “Grid Computing enables virtual organizations to share


geographically distributed resources as they pursue
common goals, assuming the absence of central location,
central control, omniscience, and an existing trust
relationship.” – Globus Alliance

 “The Web provides us information — the grid allows us to


process it.” - Ahmar Abbas
SEEK-BEAM Workshop Dec 2004 3
The Evolution of
Grid Technology
 High-Performance Computing
 Cluster Computing
 Peer-to-Peer Computing
 Internet Computing

SEEK-BEAM Workshop Dec 2004 4


High-Performance
Computing
 Traditionally
known as super-
computing
 Specialized for
parallel processing
algorithms
 Shared equally
among academia,
research, and
commercial sectors

SEEK-BEAM Workshop Dec 2004 5


Cluster Computing
 Originated 1994 – Beowulf cluster NASA
 High-performance
 Massively-parallel (2 to 1000 nodes)
 Commodity hardware (Intel, AMD)
 Low-cost software (Linux, FreeBSD)
 Interconnected via high-speed private networks
 Shared storage SAN/NAS

 AMD Athlon cluster at University of Heidelberg,


Germany – 825Gflops, 35th fastest high-
performance computer in the world
SEEK-BEAM Workshop Dec 2004 6
Cluster Computing

SEEK-BEAM Workshop Dec 2004 7


Peer-to-Peer Computing
 Primarily used for distributed storage and
file-sharing
 Early models (rcp, scp, ftp)
 Restricted to LANs, or
 Limited to known peers
 Internet-based models
 Centralized (Napster, Kazaa*)
 Decentralized (Gnutella)

*100,000,000 downloads by 2004; 2-million new downloads a week

SEEK-BEAM Workshop Dec 2004 8


Centralized Peer-to-Peer

? ?
? ?
?
.mp3 .mp3 .mp3 .mp3?

SEEK-BEAM Workshop Dec 2004 9


Decentralized Peer-to-Peer

? ?

? ?

? ?

.mp3 .mp3 .mp3 .mp3

SEEK-BEAM Workshop Dec 2004 10


Internet Computing
 Volunteer or philanthropic
computing; utilizes personal
desktop computers connected
to the Internet
 Desktop computers idle
approximately 95% of the their
lifespan
 Divide and Conqueror approach
 Tasks broken into smaller
subtasks
 Desktop executes subtasks
during idle time
 Desktop sends data back to
central server, which
aggregates results

SEEK-BEAM Workshop Dec 2004 11


Synthesis entrée Grid
 High-performance computing
 pioneered the use of “parallel” algorithms
 Cluster computing
 demonstrated the nature of shared computing and
storage
 load balancing protocols
 Peer-to-peer computing
 distributed storage resource with no central authority
 Internet computing
 geographically distributed virtual organization
 fabric of the project vanishes with completion of the
task
SEEK-BEAM Workshop Dec 2004 12
Grid Characteristics
 Resources that
 are connected via a network
 are geographically distributed
 may consist of heterogeneous hardware and/or
software
 are managed transparently for performance and
fault tolerance
 Creates the illusion of virtual organizations
and projects without the presence of
 a central authority, or
 a central control
 Explicit trust relationships between users and
resources
 A system that scales in space and time

SEEK-BEAM Workshop Dec 2004 13


Types of Resources
 Computation
 utilization of computing cycles found on processors of the
machines on the grid
 Storage
 to increase capacity, performance, sharing, and reliability of data
 Communication
 to increase capacity, performance, and reliability of data
communication
 Collaboration tools
 to facilitate collaboration through conferencing, visualization, and
data sharing
 Software and Licenses
 to share site-specific software and/or licenses
 Special equipment, capacities, architectures, and policies
 printers, imaging, sensors, or other local specialty resources

SEEK-BEAM Workshop Dec 2004 14


Grid Ingredients

SEEK-BEAM Workshop Dec 2004 15


Grid Topologies
 Departmental Grids
 localized to a specific group of people
 generally, same hardware and software
 designed for high throughput and high performance over a
dedicated network
 Enterprise Grids
 service to numerous groups within a single company or
campus
 resource heterogeneity increases
 company-wide local area network
 Extraprise Grids
 service to multiple companies, partners, and customers within
a particular domain
 domain based private network
 Global Grids
 established over the public-Internet

SEEK-BEAM Workshop Dec 2004 16


Resource-based Grids
 Compute Grids
 desktop nodes
 server nodes
 high-performance computing clusters
 Data Grids
 performance-based distributed storage
 replication for fault-tolerance
 Collaboration Grids
 support for video-conferencing, visualization and data sharing
 Utility Grids
 maintained and managed by a commercial service provider
 compute resources acquired on a per-need basis
 application resources that are purchased on a per-use or per-
minute basis

SEEK-BEAM Workshop Dec 2004 17


Application
Characteristics
Optimized for parallel Not capable of parallel
execution computation

 Perfect Parallelism – computations run


autonomously (Monte Carlo Simulations)
 Data Parallelism – operations
performed on data simultaneously (db
searches) Fibonacci Series (1, 1, 2, 3, 5, 8, 13, 21,…)
 Functional Parallelism – multiple F(k+2) = F(k+1) + F(k)
operations are performed simultaneously
SEEK-BEAM Workshop Dec 2004 18
Questions to ask?
When thinking Grid
 Identity and Authentication—Is this user who he says he is? Is
this program the right program?
 Authorization and Policy—What can the user do on the grid?
What can the application do on the grid? What resources are the
user and or application allowed to access?
 Resource Discovery—Where are the resources?
 Resource Characterization—What types of resources are
available?
 Resource Allocation—What policy is applied when assigning the
resources? What is the actual process of assigning the resources.
Who gets how much?
 Resource Management—Which resource can be used at what
time and for what purpose?
 Accounting/Billing/Service Level Agreement (SLA)—How
much of the resources is being used? What is the rating schedule?
What is the SLA?
 Security—How do I make sure that this is done securely? How do
we know if we have been compromised? What steps are taken
once a security breach is detected?
SEEK-BEAM Workshop Dec 2004 19
A Grid Computing Model
(the Globus view)
 Software stack
consisting of
 Standards
 Protocols
 APIs and SDKs
 Loosely based
on the Internet
model

SEEK-BEAM Workshop Dec 2004 20


A detailed view…
 Fabric – protocols and
interfaces to resource
being shared
 Connectivity – protocols
for grid-specific network
transactions (IP, DNS,
WSDL); Security
implementation (GSI)
 Resource – protocols to
initiate and control
sharing of local resources
(GRAM, GridFTP, GRIS)
 Collective – protocols for
system-wide deployment
(versus local)
 Application – protocols
targeted at a specific
application or class of
applications

SEEK-BEAM Workshop Dec 2004 21


Grid Protocols
 Grid Security Infrastructure (GSI)
 Grid Resource Allocation and Management
(GRAM)
 Grid File Transfer Protocol (GridFTP)
 Grid Information Services (GIS)

SEEK-BEAM Workshop Dec 2004 22


Grid Security Infrastructure
 Extended from SSL/TLS and X.509 protocols
 Utilizes PKI for Certificate Authority
 Primary objective is “Authorization”
 Generates primary credential
 Generates temporary proxy credential
 Certificate Authority
 Positively identify entities requesting certificates
 Issuing, removing, and archiving certificates
 Protecting the Certificate Authority server
 Maintaining a namespace of unique names for certificate
owners
 Serve signed certificates to those needing to
authenticate entities
 Logging activity

SEEK-BEAM Workshop Dec 2004 23


Public Key Infrastructure
Public
Certificate Keys
Authority

“A” B’s public A’s public “B”


key key

Public Authentication Private


Credential

Private Public

1. User A encrypts message with his 1. User B decrypts message with his
private key private key
2. Obtains User B’s public key from 2. Obtains User A’s public key from
CA CA
3. Encrypts message with B’s public 3. Decrypts A’s message with public
key key
4. Sends message 4. B knows message is from A

SEEK-BEAM Workshop Dec 2004 24


Grid Security Infrastructure

SEEK-BEAM Workshop Dec 2004 25


Grid Resource Allocation
and Management
 Allows programs to be started on remote resources
 Resource Specification Language (RSL)
 Resource requirements
 machine type, number of nodes, memory, etc…
 Job configuration
 directory, executable, arguments, environment
 Communication protocols
 HTTP-base RPC (early protocol)
 Web-services (WSDL, SOAP)

“create 5-10 instances of myprog, each on a machine with at least 64MB


memory that is available to me for 4 hours, or 10 instances, on a machine with
at least 32MB of memory”

SEEK-BEAM Workshop Dec 2004 26


Grid File Transfer Protocol
 Providing high-speed and reliable transfer
of large volume data (petabytes)
 Extension of standard FTP to include
 striped/parallel data channels
 partial files
 automatic and manual TCP buffer size settings
 progress monitoring
 extended restart functionality

SEEK-BEAM Workshop Dec 2004 27


Grid Information
Services
 Grid Resource Information Service (GRIS)
 provides resource specific information
 Grid Resource Registration (GRR)
 updates GRIS about resource status
 Grid Index Information Service (GIIS)
 an aggregate directory service
 provides a collection of information that has
been gathered from multiple GRIS servers
 Grid Resource Inquiry (GRI)
 queries GRIS server for resource information
 queries GIIS server for information
SEEK-BEAM Workshop Dec 2004 28
Open Grid Services
Architecture
 Marriage of grid protocols with web service
protocols
 Specifications for
 How Grid Services are created and discovered
 How Grid Service instances are named and
referenced
 Interfaces that define any Grid Service
 Initial release with GT 3.0 mid-2003; GT
4.0 Jan 2005

SEEK-BEAM Workshop Dec 2004 29


Grid Examples
 Network for Earthquake Engineering and
Simulation (NEESGrid)
 Biomedical Informatics Research Network
(BIRN)
 EcoGrid

SEEK-BEAM Workshop Dec 2004 30


NEESGrid
(Network for Earthquake Engineering and Simulation)
 Linking scientists and facilities
 observation of an experiment in progress
 observation before and after an experiment
 remote operation of an experiment
 Linking facilities and data
 hybrid operation of physical simulations with other simulations,
both physical and numerical
 automatic archiving of raw data, calibration data, and
processed data
 Linking scientists and data
 collaborative views (static) of time synchronized data
visualizations
 collaborative views of time synchronized data visualizations
with video and audio recordings
 Linking scientists and other scientists
 synchronous communication, such as with colleagues during an
experiment
 asynchronous communication, such as with colleagues over the
course of preparing a publication resulting from an experiment

SEEK-BEAM Workshop Dec 2004 31


NEESGrid
(Network for Earthquake Engineering and Simulation)

SEEK-BEAM Workshop Dec 2004 32


NEESGrid
(Network for Earthquake Engineering and Simulation)

Network Architecture Diagram

SEEK-BEAM Workshop Dec 2004 33


BIRN
(Biomedical Informatics Research Network)
 Testbed for a biomedical knowledge infrastructure
 Federated database of neuro-imaging data
 Fusion of diverse data sources (location; level of
aggregation)
 Grid access to computational resources
 Datamining software
 Scalable and extensible
 Driven by research needs, not technology-pull or
not technology-push

SEEK-BEAM Workshop Dec 2004 34


BIRN
(Biomedical Informatics Research Network)

SEEK-BEAM Workshop Dec 2004 35


BIRN
(Biomedical Informatics Research Network)

SEEK-BEAM Workshop Dec 2004 36


EcoGrid
 Metadata Standardization
 Ecological Metadata Language – “EML”
 Integrate diverse data networks from ecology, biodiversity,
and environmental sciences
 Standardized interfaces to data resources
 Metacat
 SRB
 DiGIR
 Xanthoria
 Metadata-mediated data access (application-based)
 Supports multiple metadata standards
 EML, Darwin Core as foci
 Computational services
 Pre-defined analytical services
 On-the-fly analytical services

SEEK-BEAM Workshop Dec 2004 37


EcoGrid

*EML facilitates semi-automatic data binding

SEEK-BEAM Workshop Dec 2004 38


EcoGrid

SEEK-BEAM Workshop Dec 2004 39


Grid Organizations
 Globus Alliance
 Globus ToolkitTM – Reference implementation
of the grid architecture and grid protocols
 http://www.globus.org
 NSF Middleware Initiative (NMI)
 Supports the design, development, testing,
and deployment of middleware for HPC
 http://www.nsf-middleware.org
 GRIDS Center
 Grid Research Integration Deployment and
Support Center – part of NMI
 http://www.grids-center.org
 Global Grid Forum
 Main standards body governing the world-
wide grid community
 http://www.globalgridforum.org

SEEK-BEAM Workshop Dec 2004 40


Recommended Texts
 Grid Computing: A Practical Guide to Technology and
Applications
 Ahmar Abbas
 Charles River Media © 2004
 Introduction to Grid Computing with Globus
 Luis Ferreira et al.
 IBM Redbooks © 2004
 Enabling Applications for Grid Computing with Globus
 Bart Jacob et al.
 IBM Redbooks © 2003
 Grid Services Programming and Application Enablement
 Luis Ferreira et al.
 IBM Redbooks © 2004

SEEK-BEAM Workshop Dec 2004 41

You might also like