Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Buy Now $194.00
Standard view
Full view
of .
Save to My Library
Look up keyword or section
Like this
2Activity

Table Of Contents

ClusterGOP: A High-Level Programming Environment for Clusters
1.1INTRODUCTION
1.2GOP MODEL AND ClusterGOP ARCHITECTURE
1.2.1The ClusterGOP Architecture
1.3VISUALGOP
1.4THE ClusterGOP LIBRARY
1.5MPMD PROGRAMMING SUPPORT
1.6PROGRAMMING USING ClusterGOP
1.6.1Support for Program Development
1.6.2Performance of ClusterGOP
1.7SUMMARY
2.1INTRODUCTION
2.2HPC ARCHITECTURES
2.2.1Early Parallel Processing Platforms
2.2.2Current HPC Systems
2.3HPC PROGRAMMING MODELS: THE FIRST GENERATION
2.3.1The Message Passing Interface (MPI)
2.3.2High-Performance Fortran (HPF)
2.4THE SECOND GENERATION OF HPC PROGRAMMING MODELS
2.4.1OpenMP
2.4.2Other Shared-Memory APIs
2.4.3Is A Standard High-Level API for HPC in Sight?
2.5OPENMP FOR DMPS
2.5.1A Basic Translation to GA
2.5.2Implementing Sequential Regions
2.5.3Data and Work Distribution in GA
2.5.4Irregular Computation Example
2.6EXPERIMENTS WITH OpenMP ON DMPS
2.7CONCLUSIONS
SAT: Toward Structured Parallelism Using Skeletons
3.1INTRODUCTION
3.2SAT: A METHODOLOGY OUTLINE
3.2.1Motivation and Methodology
3.2.2Abstraction View: Basic Skeletons and Compositions
3.2.3Performance View: Collective Operations
3.2.4SAT: Combining Abstraction with Performance
3.3SKELETONS AND COLLECTIVE OPERATIONS
3.3.1The H Skeleton and Its Standard Implementation
3.3.2Transformations for Performance View
3.4CASE STUDY: MAXIMUM SEGMENT SUM (MSS)
3.5PERFORMANCE ASPECTS IN SAT
3.5.1Performance Predictability
3.5.2Absolute Performance
3.6CONCLUSIONS AND RELATED WORK
4.1THE BSP MODEL
4.1.2BSP Versus Traditional Parallelism
4.1.3Memory Efficiency
4.1.4Memory Management
4.1.5Heterogeneity
4.1.6Subset Synchronization
4.1.7Other Variants of BSP
4.2BSP PROGRAMMING
4.2.1The BSPlib Standard
4.2.2Beyond BSPlib
5.1.1 Message-Passing Run-Time Systems
5.1.2 Cilk’s Dataflow Model
5.1.3 Terminology
5.2.1 Programs
5.3.1 Fibonacci
5.3.2 Traveling Salesman Problem
5.3.3 N-Queens Problem
5.3.4 Matrix Multiplication
5.3.5 Finite Differencing
5.3.6 Program Complexity
5.4 CONCLUSION
Nested Parallelism and Pipelining in OpenMP
6.1INTRODUCTION
6.2OpenMP EXTENSIONS FOR NESTED PARALLELISM
6.2.1Parallelism Definition
6.2.2Thread Groups
6.2.3Evaluation of the Proposal
6.3OPENMP EXTENSIONS FOR THREAD SYNCHRONIZATIONS
6.3.1Precedence Relations
6.3.2Evaluation of the Proposal
6.4SUMMARY
OpenMP for Chip Multiprocessors
7.1INTRODUCTION
7.23SoCARCHITECTURE OVERVIEW
7.2.1Quads
7.2.2Communication and Synchronization
7.2.3Software Architecture and Tools
7.3THE OpenMP COMPILER/TRANSLATOR
7.3.1Data Distribution
7.3.2Computation Division
7.3.3Communication Generation
7.4EXTENSIONS TO OpenMP FOR DSES
7.4.1Controlling the DSEs
7.4.2Extensions for DSEs
7.5OPTIMIZATION FOR OpenMP
7.5.1Using the MTE Transfer Engine
7.5.2Double Buffer and Data Prefetching
7.5.3Data Privatization and Other Functions
7.6IMPLEMENTATION
7.7PERFORMANCE EVALUATION
7.8CONCLUSIONS
8.1INTRODUCTION
8.2BACKGROUND MATERIAL
8.2.1Data Distribution and Data Alignment
8.2.2Data Temporal Dependency and Spatial Locality
8.2.3Computation Decomposition and Scheduling
8.3COMPILING REGULAR PROGRAMS ON DMPCS
8.4COMPILER AND RUN-TIME SUPPORT FOR IRREGULAR PROGRAMS
8.4.1The Inspectors and Executors
8.4.2The ARF Compiler
8.4.3Language Interfaces for Data Partitioners
8.5LIBRARY SUPPORT FOR IRREGULAR APPLICATIONS
8.5.1Data Partitioning and Reordering
8.5.2Solving Sparse Linear Systems
8.6RELATED WORKS
8.7CONCLUDING REMARKS
Enabling Partial-Cache Line Prefetching through Data Compression
9.1INTRODUCTION
9.2.1Dynamic Value Representation
9.2.2Partial Cache Line Prefetching
9.3CACHE DESIGN DETAILS
9.3.1Cache Organization
9.3.2Dynamic Value Conversion
9.3.3Cache Operation
9.4EXPERIMENTAL RESULTS
9.4.1Experimental Setup
9.4.2Memory Traffic
9.4.3Execution Time
9.4.4Cache Miss Comparison
9.5RELATED WORK
9.6CONCLUSION
MPI Atomicity and Concurrent Overlapping I/O
10.1INTRODUCTION
10.2CONCURRENT OVERLAPPING I/O
10.2.1POSIX Atomicity Semantics
10.2.2MPI Atomicity Semantics
10.3IMPLEMENTATION STRATEGIES
10.3.2Byte-Range File Locking
10.3.3Processor Handshaking
10.3.4Scalability Analysis
10.4EXPERIMENTAL RESULTS
10.5SUMMARY
Code Tiling: One Size Fits All
11.1INTRODUCTION
11.2CACHE MODEL
11.3CODE TILING
11.4DATA TILING
11.4.1A Sufficient Condition
11.4.3Constructing a Data Tiling for Equation (11.1)
11.5FINDING OPTIMAL TILE SIZES
11.6EXPERIMENTAL RESULTS
11.7RELATED WORK
11.8CONCLUSION
Data Conversion for Heterogeneous Migration/Checkpointing
12.1 INTRODUCTION
12.2 MIGRATION AND CHECKPOINTING
12.2.1 MigThread
12.2.2 Migration and Checkpointing Safety
12.3 DATA CONVERSION
12.3.1 Data-Conversion Issues
12.3.2 Data-Conversion Schemes
12.4 COARSE-GRAIN TAGGED RMR IN MigThread
12.4.1 Tagging and Padding Detection
12.4.2 Data Restoration
12.4.3 Data Resizing
12.4.4 Address Resizing
12.4.5 Plug-and-Play
12.5 MICROBENCHMARKS AND EXPERIMENTS
12.6 RELATED WORK
12.7 CONCLUSIONS AND FUTURE WORK
Receiving-Message Prediction and Its Speculative Execution
13.1BACKGROUND
13.2RECEIVING-MESSAGE PREDICTION METHOD
13.2.1Prediction Method
13.2.2Flow of the Prediction Process
13.2.3Static Algorithm Selection by Profiling
13.2.4Dynamic Algorithm Switching
13.3IMPLEMENTATION OF THE METHODS IN THE MPI LIBRARIES
13.4EXPERIMENTAL RESULTS
13.4.1Evaluation Environments
13.4.2Basic Characteristics of the Receiving-Message-Prediction Method
13.4.3Effects of Profiling
13.4.4Dynamic Algorithm Changing
13.5CONCLUDING REMARKS
14.1INTRODUCTION
14.2HIGH-PERFORMANCE COMPUTING WITH CLUSTER COMPUTING
14.3RECONFIGURABLE COMPUTING WITH FPGAs
14.4DRMC: A DISTRIBUTED RECONFIGURABLE METACOMPUTER
14.4.1Application Development
14.4.2Metacomputer Overview
14.4.3Hardware Setup
14.4.4Operation
14.4.5Programming the RC1000 Board
14.5ALGORITHMS SUITED TO IMPLEMENTATION ON FPGAs/DRMC
14.6ALGORITHMS NOT SUITED TO IMPLEMENTATION ON FPGAs/DRMC
14.7SUMMARY
15.1INTRODUCTION
15.2RELATED WORK
15.3SYSTEM MODEL AND PROBLEM STATEMENT
15.4RESOURCE ALLOCATION TO MAXIMIZE SYSTEM THROUGHPUT
15.4.1A Linear Programing Formulation
15.4.2An Extended Network Flow Representation
15.5EXPERIMENTAL RESULTS
15.6CONCLUSION
16.1INTRODUCTION
16.2RELATED WORK
16.4SCHEDULING POLICIES FOR PRESERVING BUS BANDWIDTH
16.5EXPERIMENTAL EVALUATION
16.6CONCLUSIONS
17.1INTRODUCTION
17.2A GRID SCHEDULING MODEL
17.2.1A Performance Model
17.2.2Unevenness of the Lengths of a Set of Tasks
17.2.3A Schedule
17.2.4Criteria of a Schedule
17.2.5A Grid Scheduling Problem
17.3RELATED WORKS
17.3.1Problem Description
17.3.2The Case of Invariable Processor Speed
17.3.3The Case of Variable Processor Speed
17.4THE PROPOSED ALGORITHM RR
17.5THE PERFORMANCE GUARANTEE OF THE PROPOSED ALGORITHM
17.6CONCLUSION
18.1INTRODUCTION
18.2GA FOR TASK ALLOCATION
18.2.1The Fitness Function
18.2.2Reliability Expression
18.3THE ALGORITHM
18.4ILLUSTRATIVE EXAMPLES
18.5DISCUSSIONS AND CONCLUSION
19.1INTRODUCTION
19.2PROBLEM DEFINITION
19.3THE SUGGESTED ALGORITHM
19.3.1Listing Mechanism
19.3.2Duplication Mechanism
19.3.3Algorithm Complexity Analysis
19.4HETEROGENEOUS SYSTEMS SCHEDULING HEURISTICS
19.4.1Fast Load Balancing (FLB-f) Algorithm
19.4.2Heterogeneous Earliest Finish Time (HEFT) Algorithm
19.4.3Critical Path on a Processor (CPOP) Algorithm
19.5EXPERIMENTAL RESULTS AND DISCUSSION
19.5.1Comparison Metrics
19.5.2Random Graph Generator
19.5.3Performance Results
19.5.4Applications
19.5.5Performance of Parents-Selection Methods
19.5.6Performance of the Machine Assignment Mechanism
19.5.7Summary and Discussion of Experimental Results
19.6CONCLUSION
20.1INTRODUCTION
20.2RELATED WORK
20.3INFORMATION ACQUISITION
20.4LINUX PROCESS CLASSIFICATION MODEL
20.4.1Training Algorithm
20.4.2Labeling Algorithm
20.4.3Classification Model Implementation
20.5RESULTS
20.6EVALUATION OF THE MODEL INTRUSION ON THE SYSTEM PERFORMANCE
20.7CONCLUSIONS
21.5.3Security
21.6ALCHEMI DESIGN AND IMPLEMENTATION
21.6.1Overview
21.6.2Grid Application Lifecycle
21.7ALCHEMI PERFORMANCE EVALUATION
21.7.1Stand-Alone Alchemi Desktop Grid
21.7.2Alchemi as Node of a Cross-Platform Global Grid
21.8SUMMARY AND FUTURE WORK
22.1INTRODUCTION
22.2OVERVIEW OF GRID MIDDLEWARE SYSTEMS
22.3UNICORE
22.4GLOBUS
22.4.1GSI Security Layer
22.4.2Resource Management
22.4.3Information Services
22.4.4Data Management
22.5LEGION
22.6GRIDBUS
22.6.1Alchemi
22.6.2Libra
22.6.3Market Mechanisms for Computational Economy
22.6.4Accounting and Trading Services
22.6.5Resource Broker
22.6.6Web Portals
22.6.7Simulation and Modeling
22.7IMPLEMENTATION OF UNICORE ADAPTOR FOR GRIDBUS BROKER
22.7.1UnicoreComputeServer
22.7.2UnicoreJobWrapper
22.7.3UnicoreJobMonitor
22.7.4UnicoreJobOutput
22.8COMPARISON OF MIDDLEWARE SYSTEMS
22.9SUMMARY
High-Performance Computing on Clusters: The Distributed JVM Approach
23.1BACKGROUND
23.1.1Java
23.1.2Java Virtual Machine
23.1.3Programming Paradigms for Parallel Java Computing
23.2DISTRIBUTED JVM
23.2.1Design Issues
23.2.2Solutions
23.3JESSICA2 DISTRIBUTED JVM
23.3.1Overview
23.3.2Global Object Space
23.3.3Transparent Java Thread Migration
23.4PERFORMANCE ANALYSIS
23.4.1Effects of Optimizations in GOS
23.4.2Thread Migration Overheads
23.4.3Application Benchmark
23.5RELATED WORK
23.5.1Software Distributed Shared Memory
23.5.2Computation Migration
23.5.3Distributed JVM
23.6SUMMARY
24.1INTRODUCTION
24.2DATA GRID SERVICES
24.2.1Metadata Services for Data Grid Systems
24.2.2Data Access Services for Data Grid Systems
24.2.3Performance Measurement in a Data Grid
24.3HIGH-PERFORMANCE DATA GRID
24.3.1Data Replication
24.3.2Scheduling in Data Grid Systems
24.3.3Data Movement
24.4SECURITY ISSUES
24.5OPEN ISSUES
24.5.1Application Replication
24.5.2Consistency Maintenance
24.5.3Asynchronized Data Movement
24.5.4Prefetching Synchronized Data
24.5.5Data Replication
24.6CONCLUSIONS
25.6DISCUSSION AND CONCLUSION
26.1INTRODUCTION
26.2HARDWARE AND SOFTWARE SETUP
26.3SYSTEM TUNING AND BENCHMARK RESULTS
26.3.1Performance Model of HPL
26.3.2BLAS Library
26.3.3Results Using Myrinet
26.3.4Results Using Gigabit Ethernet
26.4PERFORMANCE COSTS AND BENEFITS
A Grid-Based Distributed Simulation of Plasma Turbulence
27.1INTRODUCTION
27.2MPI IMPLEMENTATION OF THE INTERNODE DOMAIN DECOMPOSITION
27.4THE MPICH-G2 IMPLEMENTATION
27.5CONCLUSIONS
Evidence-Aware Trust Model for Dynamic Services
28.1MOTIVATION FOR EVALUATING TRUST
28.2SERVICE TRUST—WHAT IS IT?
28.2.1The Communication Model
28.2.2Trust Definition
28.2.3Service Provider Trust
28.2.4Service Consumer Trust
28.2.5Limitations of Current Approaches
28.3EVIDENCE-AWARE TRUST MODEL
28.4THE SYSTEM LIFE CYCLE
28.4.1The Reputation Interrogation Phase (RIP)
28.4.2The SLA Negotiation Phase
28.4.3The Trust Verification Phase (TVP)
28.5CONCLUSION
Resource Discovery in Peer-to-Peer Infrastructure
29.1INTRODUCTION
29.2DESIGN REQUIREMENTS
29.3UNSTRUCTURED P2P SYSTEMS
29.3.1Gnutella
29.3.2Freenet
29.3.3Optimizations
29.3.4Discussion
29.4STRUCTURED P2P SYSTEMS
29.4.1Example Systems
29.4.2Routing and Joining Process
29.4.3Discussion
29.4.4Revisiting Design Requirements
29.5ADVANCED RESOURCE DISCOVERY FOR STRUCTURED P2P SYSTEMS
29.5.1Keyword Search
29.5.2Search by Attribute-Value Pairs
29.5.3Range Search
29.6SUMMARY
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
30.1INTRODUCTION
30.2SEARCH MECHANISMS
30.2.1Uniformed Selection of Relay Neighbors
30.2.2Weighted Selection of Relay Neighbors
30.2.3Other Approaches
30.2.4Partial Coverage Problem
30.3HYBRID PERIODICAL FLOODING
30.3.1Periodical Flooding (PF)
30.3.2Hybrid Periodical Flooding
30.4SIMULATION METHODOLOGY
30.4.1Topology Generation
30.4.2Simulation Setup
30.5PERFORMANCE EVALUATION
30.5.1Partial Coverage Problem
30.5.2Performance of Random PF
30.5.3Effectiveness of HPF
30.5.4Alleviating the Partial Coverage Problem
30.6CONCLUSION
31.1INTRODUCTION
31.2HIERARCHICAL P2P ARCHITECTURE
31.2.1Hierarchical P2P Layers
31.2.2Distributed Binning Scheme
31.2.3Landmark Nodes
31.2.4Hierarchy Depth
31.3SYSTEM DESIGN
31.3.1Data Structures
31.3.2Routing Algorithm
31.3.3Node Operations
31.3.4Cost Analysis
31.4PERFORMANCE EVALUATION
31.4.1Simulation Environment
31.4.2Routing Costs
31.4.3Routing Cost Distribution
31.4.4Landmark Nodes Effects
31.4.5Hierarchy Depth Effect
31.5RELATED WORKS
31.6SUMMARY
32.1INTRODUCTION
32.2GROUP OF AGENTS
32.2.1Autonomic Group Agent
32.2.2Views
32.3FUNCTIONS OF GROUP PROTOCOL
32.4AUTONOMIC GROUP PROTOCOL
32.4.1Local Protocol Instance
32.4.2Global Protocol Instance
32.5RETRANSMISSION
32.5.1Cost Model
35.5.2Change of Retransmission Class
32.5.3Evaluation
32.6CONCLUDING REMARKS
33.1INTRODUCTION
33.2LOCATION MANAGEMENT WITH AND WITHOUT CACHE
33.2.1Movement-based Location Management in 3G Cellular Networks
33.2.2Per-User Caching
33.3THE CACHE-ENHANCED LOCATION MANAGEMENT SCHEME
33.3.1Caching Schemes
33.3.2The Location Update Scheme in Detail
33.3.3Paging Scheme
33.4SIMULATION RESULTS AND ANALYSIS
33.4.1Simulation Setup
33.5CONCLUSION
Maximizing Multicast Lifetime in Wireless Ad Hoc Networks
34.1INTRODUCTION
34.2ENERGY CONSUMPTION MODEL IN WANETs
34.3DEFINITIONS OF MAXIMUM MULTICAST LIFETIME
35.2PERFECT SCHEDULING PROBLEM FOR BIPARTITE SCATTERNETS
35.3PERFECT ASSIGNMENT SCHEDULING ALGORITHM FOR BIPARTITE SCATTERNETS
36.1 INTRODUCTION
36.2 PRELIMINARIES
36.2.1 Partition Graph
36.2.2 Configuration Graph
36.2.3 Partitioning Graph Metrics
36.2.4 System Load Metrics
36.2.5 Partitioning Metrics
36.3 THE MinEX PARTITIONER
36.3.1 MinEX Data Structures
36.3.2 Contraction
36.3.3 Partitioning
36.3.4 Reassignment Filter
36.3.5 Refinement
36.3.6 Latency Tolerance
36.4 N-BODY APPLICATION
36.4.1 Tree Creation
36.4.2 Partition Graph Construction
36.4.3 Graph Modifications for METIS
36.5 EXPERIMENTAL STUDY
36.5.1 Multiple Time Step Test
36.5.2 Scalability Test
36.6 CONCLUSIONS
Building a User-Level Grid for Bag-of-Tasks Applications
37.1INTRODUCTION
37.2DESIGN GOALS
37.3ARCHITECTURE
37.4WORKING ENVIRONMENT
37.5SCHEDULING
37.6IMPLEMENTATION
37.7PERFORMANCE EVALUATION
37.7.1Simulation of Supercomputer Jobs
37.7.2Fighting AIDS
37.7.3Gauging the Home Machine Bottleneck
37.8CONCLUSIONS AND FUTURE WORK
38.1INTRODUCTION
38.1.1An Efficient Sequential Algorithm
38.2COMPUTING IN PARALLEL
38.3EXPERIMENTAL RESULT
38.4CONCLUSION
39.1INTRODUCTION
39.2.1Hardware Configuration
39.2.2Software Configuration
39.2.3Database Files Refreshing Scheme
39.2.4Job Parsing, Scheduling and Processing
39.2.5Integration with SmedDb
39.4CONCLUSIONS
P. 1
High-Performance Computing: Paradigm and Infrastructure

High-Performance Computing: Paradigm and Infrastructure

Ratings:

2.0

(1)
|Views: 1,651|Likes:
Published by Wiley
The state of the art of high-performance computing

Prominent researchers from around the world have gathered to present the state-of-the-art techniques and innovations in high-performance computing (HPC), including:
* Programming models for parallel computing: graph-oriented programming (GOP), OpenMP, the stages and transformation (SAT) approach, the bulk-synchronous parallel (BSP) model, Message Passing Interface (MPI), and Cilk
* Architectural and system support, featuring the code tiling compiler technique, the MigThread application-level migration and checkpointing package, the new prefetching scheme of atomicity, a new "receiver makes right" data conversion method, and lessons learned from applying reconfigurable computing to HPC
* Scheduling and resource management issues with heterogeneous systems, bus saturation effects on SMPs, genetic algorithms for distributed computing, and novel task-scheduling algorithms
* Clusters and grid computing: design requirements, grid middleware, distributed virtual machines, data grid services and performance-boosting techniques, security issues, and open issues
* Peer-to-peer computing (P2P) including the proposed search mechanism of hybrid periodical flooding (HPF) and routing protocols for improved routing performance
* Wireless and mobile computing, featuring discussions of implementing the Gateway Location Register (GLR) concept in 3G cellular networks, maximizing network longevity, and comparisons of QoS-aware scatternet scheduling algorithms
* High-performance applications including partitioners, running Bag-of-Tasks applications on grids, using low-cost clusters to meet high-demand applications, and advanced convergent architectures and protocols

High-Performance Computing: Paradigm and Infrastructure is an invaluable compendium for engineers, IT professionals, and researchers and students of computer science and applied mathematics.
The state of the art of high-performance computing

Prominent researchers from around the world have gathered to present the state-of-the-art techniques and innovations in high-performance computing (HPC), including:
* Programming models for parallel computing: graph-oriented programming (GOP), OpenMP, the stages and transformation (SAT) approach, the bulk-synchronous parallel (BSP) model, Message Passing Interface (MPI), and Cilk
* Architectural and system support, featuring the code tiling compiler technique, the MigThread application-level migration and checkpointing package, the new prefetching scheme of atomicity, a new "receiver makes right" data conversion method, and lessons learned from applying reconfigurable computing to HPC
* Scheduling and resource management issues with heterogeneous systems, bus saturation effects on SMPs, genetic algorithms for distributed computing, and novel task-scheduling algorithms
* Clusters and grid computing: design requirements, grid middleware, distributed virtual machines, data grid services and performance-boosting techniques, security issues, and open issues
* Peer-to-peer computing (P2P) including the proposed search mechanism of hybrid periodical flooding (HPF) and routing protocols for improved routing performance
* Wireless and mobile computing, featuring discussions of implementing the Gateway Location Register (GLR) concept in 3G cellular networks, maximizing network longevity, and comparisons of QoS-aware scatternet scheduling algorithms
* High-performance applications including partitioners, running Bag-of-Tasks applications on grids, using low-cost clusters to meet high-demand applications, and advanced convergent architectures and protocols

High-Performance Computing: Paradigm and Infrastructure is an invaluable compendium for engineers, IT professionals, and researchers and students of computer science and applied mathematics.

More info:

Publish date: Nov 18, 2005
Added to Scribd: Jun 06, 2013
Copyright:Traditional Copyright: All rights reservedISBN:9780471732709
List Price: $194.00 Buy Now

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

05/18/2014

818

9780471732709

$194.00

USD

You're Reading a Free Preview
Pages 17 to 437 are not shown in this preview.
You're Reading a Free Preview
Pages 454 to 534 are not shown in this preview.
You're Reading a Free Preview
Pages 551 to 687 are not shown in this preview.
You're Reading a Free Preview
Pages 704 to 708 are not shown in this preview.
You're Reading a Free Preview
Pages 725 to 818 are not shown in this preview.

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download