Professional Documents
Culture Documents
An Introduction
金仲達
國立清華大學資訊工程學系
king@cs.nthu.edu.tw
Clusters Have Arrived
1
What is a Cluster?
2
Outline
Cluster Computing
4
How to Run Applications Faster ?
Computer analogy
Use faster hardware: e.g. reduce the time per instruction
(clock cycle)
Optimized algorithms and techniques
Multiple computers to solve problem
=> techniques of parallel processing is mature and can
be exploited commercially 5
Motivation for Using Clusters
8
Why Cluster Now?
9
Hardware and Software Trends
Important advances taken place in the last five year
Network performance increased with reduced cost
Workstation performance improved
Average number of transistors on a chip grows 40% per year
Clock frequency growth rate is about 30% per year
Expect 700-MHz processors with 100M transistors in early 2000
Availability of powerful and stable operating systems
(Linux, FreeBSD) with source code access
10
Why Clusters NOW?
Cluster Computers
16
Clusters Classification 1
17
HA Clusters
18
Clusters Classification 2
19
Clusters Classification 3
20
Clusters Classification 4
21
Clusters Classification 5
Commodity Components
23
Cluster Computer Architecture
24
Cluster Components...1a
Nodes
28
Cluster Components…4
Network Interfaces
Dedicated Processing
power and storage Mryicom
Net
embedded in the
Network Interface 160 MB/s
Myricom
An I/O card today NIC
M
Tomorrow on chip? P
30
Cluster Components…5
Communication Software
Traditional OS supported facilities (but heavy
weight due to protocol processing)..
Sockets (TCP/IP), Pipes, etc.
Light weight protocols (user-level): minimal
Interface into OS
User must transmit directly into and receive from the
network without OS intervention
Communication protection domains established by
interface card and OS
Treat message loss as an infrequent case
Active Messages (Berkeley), Fast Messages (UI), ... 31
Cluster Components…6a
Cluster Middleware
Hardware
DEC Memory Channel, DSM (Alewife, DASH) SMP
techniques
OS/gluing layers
Solaris MC, Unixware, Glunix
Applications and Subsystems
System management and electronic forms
Runtime systems (software DSM, PFS etc.)
Resource management and scheduling (RMS):
CODINE, LSF, PBS, NQS, etc.
33
Cluster Components…7a
Programming Environments
MPI
Linux, NT, on many Supercomputers
PVM
Software DSMs (Shmem)
34
Cluster Components…7b
Development Tools?
Compilers
C/C++/Java/
RAD (rapid application development tools):
GUI based tools for parallel processing modeling
Debuggers
Performance monitoring and analysis tools
Visualization tools
35
Cluster Components…8
Applications
Sequential
Parallel/distributed (cluster-aware applications)
Grand challenging applications
Weather Forecasting
Quantum Chemistry
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
……………….
37
Middleware Design Goals
Complete transparency
Let users see a single cluster system
Single entry point, ftp, telnet, software loading...
Scalable performance
Easy growth of cluster
no change of API and automatic load distribution
Enhanced availability
Automatic recovery from failures
Employ checkpointing and fault tolerant technologies
Handle consistency of data when replicated..
38
Single System Image (SSI)
Hardware Level
41
Availability Support Functions
Single I/O space (SIO):
Any node can access any peripheral or disk devices
without the knowledge of physical location.
Single process space (SPS)
Any process can create processes on any node, and they
can communicate through signals, pipes, etc, as if they
were one a single node
Checkpointing and process migration
Saves the process state and intermediate results in memory
or disk; process migration for load balancing
Reduction in the risk of operator errors 45
Relationship among Middleware
Modules
46
Strategies for SSI
48
Research Projects of Clusters
50
Comparison of 4 Cluster Systems
54
Task Forces
on Cluster Computing
55
IEEE Task Force on Cluster
Computing (TFCC)
http://www.dgs.monash.edu.au/~rajkumar/tfcc/
http://www.dcs.port.ac.uk/~mab/tfcc/
56
TFCC Activities
Mailing list, workshops, conferences, tutorials,
web-resources etc.
Resources for introducing the subject in senior
undergraduate and graduate levels
Tutorials/workshops at IEEE Chapters
….. and so on.
58
NCHC PC Cluster
59
System Hardware
61
Conclusions
62
The Future
Cluster system using idle cycles from computers
will continue
Individual nodes will have of multiple processors
Widespread usage of Fast and Gigabit Ethernet and
they will become de facto network for clusters
Cluster software bypass OS as much as possible
Unix-based OS are likely to be most popular, but
the steady improvement and acceptance of NT will
not be far behind
63
The Challenges
Programming
enable applications, reduce programming effort,
distributed object/component models?
Reliability (RAS)
programming effort, reliability with scalability to 1000’s
Heterogeneity
performance, configuration, architecture and interconnect
Resource Management (scheduling, perf. pred.)
System Administration/Management
Input/Output (both network and storage)
64
Pointers to Literature on
Cluster Computing
65
Reading Resources..1a
Internet & WWW
Computer architecture:
http://www.cs.wisc.edu/~arch/www/
PFS and parallel I/O:
http://www.cs.dartmouth.edu/pario/
Linux parallel processing:
http://yara.ecn.purdue.edu/~pplinux/Sites/
Distributed shared memory:
http://www.cs.umd.edu/~keleher/dsm.html
66
Reading Resources..1b
Internet & WWW
Solaris-MC:
http://www.sunlabs.com/research/solaris-mc
Microprocessors: recent advances
http://www.microprocessor.sscc.ru
Beowulf:
http://www.beowulf.org
Metacomputing
http://www.sis.port.ac.uk/~mab/Metacomputing/
67
Reading Resources..2
Books
In Search of Cluster
by G.Pfister, Prentice Hall (2ed), 98
High Performance Cluster Computing
Volume1: Architectures and Systems
Volume2: Programming and Applications
Edited by Rajkumar Buyya, Prentice Hall, NJ,
USA.
Scalable Parallel Computing
by K Hwang & Zhu, McGraw Hill,98 68
Reading Resources..3
Journals