Professional Documents
Culture Documents
Abstract:
This TUTORIAL emphasizes design of multiprocessor system, parallel architecture, design considerations and applications. The topics covered include: Classes of computer systems, SIMD, MIMD computers, interconnection networks and parallel memories, parallel algorithms; performance evaluation of parallel systems, pipelined computers, multiprocessing by tight and loose coupling, distributed systems, and software considerations.
1. Introduction to Parallel Processing ( 1 class) Evolution of computer systems Parallelism in Uniprocessor Parallel computer structures: Pipelined, Array, Multiprocessor Architectural Classification: SISD, SIMD, MIMD, MISD Parallel processing Applications
2. Parallel Computer Models Multiprocessor and Multicomputers Shared- Memory, distributed memory Multivector and SIMD computers PRAM and VLSI models
Processors and Memory Hierarchy CISC, RISC, Superscalar Processors Virtual memory Cache Shared memory
Multiprocessor and Multicomputers Interconnects: Bus, Crossbar,multiport memory Cache coherence, snooping Intel paragon, future multicomputers Message passing, multicast routing algorithms
Applications. Design of Multiprocessor systems with existing latest technology Architectural Trends in High performance Computing Challenges, Scalability,ILP, VLIW, Predictions
MICROPROCESSOR OVERVIEW
Microprocessor Number of transistors 2300 70000 0.5 MIPS Performance Number of Instructions 45 80 Different 14 address mode Size B,W,L
Computer Evolution
Generation I Vacuum Tube/ Accoustic Mem Generation II Transistor/Ferrite Core Generation III Integrated Circuits Generation V Non VonNeuman Architecture Parallel Processing 1946- 1954 1955-64 1965-74 1985- present
Parallel Processing is an efficient form of information processing which emphasizes the exploitation of concurrent events Concurrency implies parallelism, simultaneity and pipelining. Multiple jobs or programs multiprogramming Time sharing multiprocessing
From Operating Systems View, computers have improved in 4 phases: Batch Processing Multiprogramming Time Sharing Multiprocessing
Parallel Processing in UniProcessor: Multiple Functional Units Pipelining, Parallel Adders, Carry Look Ahead Adders Overlapped CPU and I/O operation Use of Hierarchical Memory Balancing Subsystem Bandwidth Multiprogramming and Time Sharing
+-
NAV
DOORS Window Motors Lock Motors Switches (window, lock, mirror, memory, disarm, ajar) Mirrors Courtesy Lamps
INSTRUMENT CLUSTER CAN B / CAN C Gateway +
PHONE
INSTRUMENT PANEL Instrument Cluster Junction Block Module HVAC Unit/Controller Radio/ICU Speakers Switches Lamps Cigar Lighter/Power Outlets Frontal Passive Restraints SKIM
DVD PLAYER
+
DISTRIBUTED AMPLIFIERS
MOST
CAN B
Low Speed Data Bus
CAN C
High Speed Data Bus
Hardware Implementations
Using an external CAN controller CAN Protocol Controller Using an internal CAN controller Processor
Transceiver
Transceiver
BUS Termination
Termination resistors at the ends of the main backbone Each 120 ohms 5% 1/4 W
Absolutely required when bus length and speed increases Not used by many medium and low speed CAN implementations
CAN CONTROLLER
TRANSCEIVER
CAN Bus
Unmanned Vehicle
Product: NASA's Mars Sojourner Rover. Microprocessor: 8-bit Intel 80C85.
Self Guided Vehicle with 4 independent wheel controller and Central Controller with GPS.
WHEEL/Motor WHEEL/Motor
CAN Port
CAN network
CAN Port
DSP TI 240LF With PWM outputs Central 32 bit MicroController With GPS.
CAN network
WHEEL/Motor WHEEL/Motor
I/O
CU
IS
PU
DS
MU
(a) SISD Uniprocessor Architecture Captions: CU - Control Unit MU Memory Unit DS Date Stream ; ; PU Processing Unit IS Instruction Stream
Flynns Classification of Computer Architectures (Derived from Michael Flynn, 1972) (contd)
Program Loaded From Host IS CU IS PEn DS LMn DS PE1 DS LM1 DS DS Loaded From Host
(b) SIMD Architecture (with Distributed Memory) Captions: CU - Control Unit MU - Memory Unit DS - Date Stream LM Local Memory ; ; ; PU - Processing Unit IS - Instruction Stream PE Processing Element
Flynns Classification of Computer Architectures (Derived from Michael Flynn, 1972) (contd)
IS I/O CU1 IS IS PU1 DS DS Shared Memory I/O IS (c) MIMD Architecture (with Shared Memory) Captions: CU - Control Unit MU - Memory Unit DS - Date Stream LM Local Memory ; ; ; PU - Processing Unit IS - Instruction Stream PE Processing Element CUn PUn
Flynns Classification of Computer Architectures (Derived from Michael Flynn, 1972) (contd)
IS Memory (Program And Data) I/O (d) MISD Architecture (the Systolic Array) Captions: CU - Control Unit MU - Memory Unit DS - Date Stream LM Local Memory ; ; ; PU - Processing Unit IS - Instruction Stream PE Processing Element DS IS CU1 IS PU1 DS CU2 IS PU2 DS PUn CUn IS
DS
b) Explicit Parallelism
Source code written in concurrent dialects of C, Fortran, Lisp or Pascal
LANs for distributed processing workstations, PCs Mesh connected Intel Butterfly/Fat Tree CM5 Hypercubes NCUBE
Cross-point or multi-stage Cray, Fujitsu, Hitachi, IBM, NEC, Tera Simple, ring multi bus, multi replacement Bus multis DEC, Encore, NCR, Sequent, SGI, Sun
Fast LANs for hign availability and high capacity clusters DEC, Tandem
SHARED MEMORY MULTIPROCESSOR MODELS: a. Uniform Memory Access (UMA) b. Non-Uniform Memory Access (NUMA) c. Cache-Only Memory Architecture (COMA)
Interconnect Network (BUS, CROSS BAR, MULTISTAGE NETWORK) I/O SM1 Shared Memory SMm
LMn
Pn
(a) Shared local Memories (e.g., the BBN Butterfly) NUMA Models for Multiprocessor Systems
P C P I N P Cluster 1
CSM CSM
P C P I N
CSM CSM
CSM
P Cluster 2
CSM
(b) A hierarchical cluster model (e.g., the Cedar system at the University of Illinois) NUMA Models for Multiprocessor Systems
D D C D P
D D C D P
Message-passing interconnection network (Mesh, ring, torus, hypercube, cube-connected cycle, etc.)
P M
P M
P M
P M
P M
e.g., Intel Paragon, nCUBE/2 Important issues: Message Routing Scheme, Network flow control strategies, dead lock avoidance, virtual channels, message-passing primitives, program decomposition techniques.
PRAM
Scalar Processor Scalar Functional Pipelines Scalar Instructions Scalar Control Unit Instructions Vector Scalar Main Memory Data (Program & Data) Data Vector Registers Host Computer Mass Storage I/O (User)
Example for SIMD machines MasPar MP-1; 1024 to 16 K RISC processors CM-2 from Thinking Machines, bitslice, 65K PE DAP 600 from Active memory Tech., bitslice
Linear Array
Ring
Binary Tree
The Channel width of Fat Tree increases as we ascend from leaves to root. This concept is used in CM5 connection Machine.
Mesh
Torus
3-cube
Binary Hypercube has been a popular architecture. Binary tree, mesh etc can be embedded in the hypercube. But: Poor scalability and implementing difficulty for higher dimensional hypercubes. CM2 implements hypercube CM5 Fat tree Intel IPSC/1, IPSC/2 are hypercubes Intel Paragon 2D mesh The bottom line for an architecture to survive in future systems is packaging efficiency and scalability to allow modular growth.
Pipelined floating point unit 8k data cache; 8k instruction cache + external cache 2 instructions per CPU cycle CMOS 4 VLSI, .75 micron, 1.68 million transistors 32 floating point registers; 32 integer registers, 32 bit fixed length instruction set 300 MIPS & 150 MFLOPS
MIMD BUS
NS 320032 Processor Card Data 64+8 Bus Arbiter Memory Cards Address 32+4 Interrupt 14 + control I/O Cards ULTRA Interface
NANO BUS
MIMD BUS
Standards :
Intel MULTBUS II Motorola VME Texas Instrument NU BUS IEEE )896 FUTURE BUS
BUS LATENCY
The time for bus and memory to complete a memory access Tiem to acquire BUS + memory read or write time including Parity check, error correction etc.
Hierarchical Caching
Main Memory
ENCORE Cache Cache Computer ULTRAMAX ULTRA Interface Processors Multiple Processors On Nano bus System
Multiprocessor Systems
3 types of interconnection between processors: Time shared common bus fig a CROSS-BAR switch network fig b Multiport memory fig c
P1 BC
P2
Pn I/O 1
I/O 2 M1 M2 Mk
P1
P2
P3
P1
P2
P3
M1 I/O1 M2 I/O2 M3