Professional Documents
Culture Documents
A Key to Performance
Dheeraj Bhardwaj
Department of Computer Science & Engineering
Indian Institute of Technology, Delhi –110 016 India
http://www.cse.iitd.ac.in/~dheerajb
Introduction
• Traditional Science
• Observation
• Theory
• Experiment -- Most expensive
1
Introduction
Performance
2
Parallel Computer
Definition :
A parallel computer is a “Collection of processing elements
that communicate and co-operate to solve large problems
fast”.
Geographic
Information
Systems
3
Need of more Computing Power: Grand Challenge Applications
• Weather Forecasting
• Astrophysical Calculations
• Computational Chemistry
• Molecular Modelling
• Molecular Dynamics
• Structural Mechanics
Dheeraj Bhardwaj <dheerajb@cse.iitd.ac.in> August, 2002 8
4
Grand Challenge Applications
Business/Industry Applications
• Electronic Governance
• Medical Imaging
Internet Applications
• Web Servers
• Digital libraries
Time
5
Application Trends
Need of numerical and non-numerical algorithms
Numerical Algorithms
• Dense Matrix Algorithms
• Solving linear system of equations
• Solving Sparse system of equations
• Fast Fourier Transformations
Non-Numerical Algorithms
• Graph Algorithms
• Sorting algorithms
• Search algorithms for discrete Optimization
• Dynamic Programming
Dheeraj Bhardwaj <dheerajb@cse.iitd.ac.in> August, 2002 11
6
Applications – Commercial computing
Sources of Parallelism in Query Processing
Parallelism within Transactions (on line transaction
processing)
Parallelism within a single complex transactions.
Transactions of a commercial database require
processing large complex queries.
Parallelizing Relational Databases Operations
7
Requirements for Commercial Applications
Requirements for applications
P P P P M M M M
P P P P
Interconnection Network
Interconnection Network
Shared Memory
8
Serial and Parallel Computing
Fetch/Store Fetch/Store
Compute Compute/communicate
Cooperative game
• Serial Algorithm
Parallel System
9
Issues in Parallel Computing
SIMD MIMD
10
SIMD Features
MIMD Features
11
MIMD Classification
MPP
Non-shared memory
Clusters
MIMD Uniform memory access
PVP
Shared memory SMP
Non-Uniform memory access
CC-NUMA
NUMA
COMA
Dheeraj Bhardwaj <dheerajb@cse.iitd.ac.in> August, 2002 23
12
Symmetric Multiprocessors (SMPs)
13
Symmetric Multiprocessors (SMPs)
M
AR
AP
rof
sn DIR I/O
iota Bridge
cil Controller
pp
Ac
iif
tn I/O Bus
iec Memory
Sl
el
ara
Pg
ntii
r
W Dheeraj Bhardwaj <dheerajb@cse.iitd.ac.in> August, 2002
14
Symmetric Multiprocessors (SMPs)
Issues
15
Better Performance for clusters
Node
Complexity
Distributed
Computer
Clusters
Systems
SMPs
MPPs
Single-System Image
Clusters
A cluster is a type of parallel or distributed processing system,
which consists of a collection of interconnected stand-alone
computers cooperatively working together as a single, integrated
computing resource.
Cluster Architecture
Programming Environment Web Windows Other Subsystems
(Java, C, Fortran, MPI, PVM) User Interface (Database, OLTP)
Single System Image Infrastructure
Availability Infrastructure
OS OS OS
……… … …
Node Node Node
Commodity Interconnect
16
Clusters Features
Cost/performance
Clusters Features
Commodity parts
Incremental Scalability
Independent Failure
17
Cluster Challenges
Parallel I/O
18
PARAM 10000 - A 100 GF Parallel Supercomputer
Developed by - Centre for Development of Advanced Computing, India
• Productivity
• Reliability
• Availability
• Usability
• Scalability
• Available Utilization
• Performance/cost ratio
19
Requirements for Applications
Parallel I/O
Optimized libraries
Partitioning of data
Mapping of data onto the processors
Reproducibility of results
Synchronization
Scalability and Predictability of performance
20
Success depends on the combination of
21
Principles of Parallel Algorithms and Design
Questions to be answered
Two keysteps
22
Parallel Algorithms - Characteristics
Types of Parallelism
Data parallelism
Task parallelism
Stream parallelism
23
Types of Parallelism - Data Parallelism
Responsibility of programmer
• Specifying the distribution of data structures
24
Types of Parallelism - Task Parallelism
Programmer’s responsibility
• Programmer must deal explicitly with process creation,
communication and synchronization.
Task parallelism
Example
Vehicle relational database to process the following
query
(MODEL = “-------” AND YEAR = “-------”)
AND (COLOR = “Green” OR COLOR =“Black”)
25
Types of Parallelism - Data and Task Parallelism
Two Approaches
• Add task parallel constructs to data parallel
constructs.
• Add data parallel constructs to task parallel
construct
Approach to Integration
• Language based approaches.
• Library based approaches.
Example
26
Types of Parallelism - Data and Task Parallelism
Advantages
Generality
Ability to increase scalability by exploiting both
forms of parallelism in a application.
Ability to co-ordinate multidisciplinary applications.
Problems
Differences in parallel program structure
Address space organization
Language implementation
27
Types of Parallelism - Stream Parallelism
Decomposition Techniques
28
Decomposition Techniques
Phase parallel
Pipeline
Process farm
Work pool
Remark :
The parallel program consists of number of super
steps, and each super step has two phases :
Dheeraj Bhardwaj <dheerajb@cse.iitd.ac.in> August, 2002 58
29
Phase Parallel Model
30
Pipeline
Process Farm
31
Work Pool
Implicit parallelism
Explicit Parallelism
32
Implicit Parallel Programming Models
Question :
• Are parallelizing compilers effective in
generalizing efficient code from sequential
programs?
– Some performance studies indicate that may
not be a effective
– User direction and Run-Time Parallelization
techniques are needed
33
Implicit Parallel Programming Models
Implicit Parallelism
Bernstein’s Theorem
• It is difficult to decide whether two operations in
an imperative sequential program can be executed
in parallel
• An implication of this theorem is that there is no
automatic technique, compiler time or runtime that
can exploit all parallelism in a sequential program
34
Explicit Parallel Programming Models
Data-parallel model
Message-passing model
Shared-variable model
Main Message-
Data-Parallel Shared-Variable
Features Passing
Control flow
(threading) Single Multiple Multiple
35
Explicit Parallel Programming Models
36
Explicit Parallel Programming Models
Message – Passing
Message – Passing
37
Explicit Parallel Programming Models
38
Other Parallel Programming Models
Functional programming
Logic programming
Computing by learning
One-to-All Broadcast
All-to-All Broadcast
Circular Shift
Reduction
Prefix Sum
39
Basic Communication Operations
40
Performance Requirements
Performance Requirements
Remarks
41
Performance Metrics of Parallel Systems
42
Performance Metrics of Parallel Systems
Speedup metrics
# ' 4 1 4
4 =
α ' + (1 - α) '/4 = 4
1 + ( - 1)α α
as ∞
43
Performance Metrics of Parallel Systems
# =
' =
as 4 ∞
4
α ' + (1- α) '/4 + α0 + T0 / '
$ 0
44
Performance Metrics of Parallel Systems
45
Performance Metrics of Parallel Systems
Conclusions
46
References
Final Words
Acknowledgements
• Centre for Development of Advanced Computing (C-DAC)
http://www.cse.iitd.ac.in/~dheerajb/links.htm
47