Centre for Computer Technology

ICT123 Computer Architecture
Week 10

Multiprocessor Architecture

Content at a Glance
Review of week9  Introduction  Flynn’s Taxonomy  Multiprocessors  Multicomputers  Multiprocessor Architecture

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Paging - Allocation of Free Frames

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Demand Paging – Bring in pages as required (swap pages)  Thrashing – Too many processes, in too little memory  TLB - Contains page table entries that have been most recently used, most references will be to locations in recently used pages  Segments are multiple address spaces of variable dynamic size

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache operation – overview
1.
2. 3.

4.
5.

6.

CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Typical Cache Organization

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Associative Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Two Way Set Associative Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

It is common to find multiple processors in a computer system, often within the same chip  Multiprocessor system and parallel computing organisation issues are similar and both need careful consideration to facilitate construction of high performance distributed computing systems

March 20, 2012

Introduction (1)

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Introduction (2)
Instruction level parallelism has been exploited for a long time, mainly through pipelining and micro-operation parallelism  Superscalar machines with multiple execution units within a single processor (uni-processor systems) allow parallel execution of multiple instructions from the same program

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Introduction (3)
Systems with multiple processors extend parallelism to multi-program threads  Symmetric multiprocessors (SMPs), although the earliest, are still the most common parallel organisation  Clusters are common in multi-server systems with workloads beyond the capability of SMPs

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Introduction (4)
Non Uniform Memory Access (NUMA) is a more recent approach, used in larger data warehouse systems, supporting the most recent virtualization approach  Multiprocessor environments are classified as either tightly coupled or loosely coupled systems

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Flynn’s Taxonomy of Parallel Processor Architectures

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Taxonomy of Parallel Computers

Flynn’s taxonomy of parallel computers.
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Flynn’s Taxonomy (examples)
SISD SIMD -- IBM370, 486 PC, Macintosh, VAX -- NASA’s MPP, ILLIAC IV

MISD
MIMD
March 20, 2012

-- None
-- Butterfly, Cray X/MP
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Organizations - SISD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

SISD
    

Single processor Single instruction stream Data stored in single memory Uni-processor The control unit (CU) provides an instruction stream (IS) to a processing unit (PU) The PU operates on a single data stream (DS) from a memory unit (MU).
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Parallel Organizations - SIMD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute


      

Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors A single CU feeds a single IS to multiple PU’s. Dedicated Local memory (LM) or shared memory
March 20, 2012

SIMD

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Processing - MISD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

MISD
Sequence of data  Transmitted to set of processors  Each processor executes different instruction sequence  Impractical  Has not been implemented

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Organizations - MIMD Shared Memory

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Organizations - MIMD Distributed Memory

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

     

Set of processors Simultaneously execute different instruction sequences Different sets of data Multiple CU’s feed IS to its own PU Shared memory or distributed memory multicomputer Further classified by method of processor communication Examples include SMPs, clusters and NUMA systems
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

MIMD

March 20, 2012

Tightly Coupled MP Systems (1)

P1

Memory

P2

I-O 1

I-O 2

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Tightly Coupled MP Systems (2)
Processors share memory (global common memory)  Communicate via that shared memory  Each processor can have its own local memory  Most commercial multiprocessors provide a cache memory with each CPU  Tolerate a higher degree of interaction between tasks

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Block Diagram of Tightly Coupled Multiprocessor System

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Loosely Coupled MP Systems (1)
M1 M2

Communications P1 link P2

I-O 1
March 20, 2012

I-O 2
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Loosely Coupled MP Systems (2)
Each processor has its own private memory  The processors are tied together by a switching scheme designed to route information by a message passing scheme  Information is relayed in packets  Most efficient when interaction between tasks is minimal

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Multiprogramming and Multiprocessing

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Homogeneous Multiprocessors on a Chip

(a)
March 20, 2012

Single-chip multiprocessors. A dual-pipeline chip. (b) A chip with two cores.
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)

Heterogeneous Multiprocessors on a Chip (1)

The logical structure of a simple DVD player contains a heterogeneous multiprocessor containing multiple cores for different functions.
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Heterogeneous Multiprocessors on a Chip (2) An example of the IBM CoreConnect architecture.

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Multiprocessors and Multiprocessing

 

Hardware - Multiprocessor computers have become commodity products, e.g., quad-processor Pentium Pros, SGI and Sun workstations. Programming - Multithreaded programming is supported by commodity operating systems, e.g., Windows NT, UNIX/Pthreads. Applications - Traditionally science and engineering. Now also business and home computing. Problem - Difficulty of multithreaded programming compared to sequential programming.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Why Buy a Multiprocessor?
Multiple users.  Multiple applications.  Multitasking within an application.  Responsiveness and/or throughput.

(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Multiprocessors

a. b.

A multiprocessor with 16 CPUs sharing a common memory. An image partitioned into 16 sections, each being March 20, 2012 analyzed by a different CPU. Mitra Richard Salomon, Sudipto
Copyright Box Hill Institute (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)

Multicomputers

a. b.

A multicomputer with 16 CPUs, each with its own private memory. The bit-map image split up among the 16 memories.
March 20, 2012 (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Topology

The heavy dots represent switches. The CPUs and memories are not shown
(a) A star (d) A ring (g) A cube
March 20, 2012

(b) A complete interconnect (e) A grid (h) A 4D hypercube.
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

(c) A tree (f) A double torus.

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)

Multiprocessor Architectures

Message-Passing Architectures
 

Separate address space for each processor. Processors communicate via message passing.

Shared-Memory Architectures
 


Single address space shared by all processors. Processors communicate by memory read/write. SMP or NUMA. Cache coherence is important issue.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Message-Passing Architecture
memory memory

...

memory

cache

cache

cache

processor

processor

...

processor

interconnection network

(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Shared-Memory Architecture
processor 1 processor 2

...

processor N

cache

cache

cache

interconnection network

memory 1

memory 2

...

memory M

(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Shared Memory Architecture – SMP (Symmetric Multiprocessor) (1)

A stand alone computer with the following characteristics

  

Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Shared Memory Architecture – SMP (Symmetric Multiprocessor) (2)

All processors share access to I/O
 Either

through same channels or different channels giving paths to same devices

All processors can perform the same functions (hence symmetric)  System controlled by integrated operating system

 providing

interaction between processors  Interaction at job, task, file and data element levels
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

 

Performance - if some work can be done in parallel Availability

SMP Advantages

Since all processors can perform the same functions, failure of a single processor does not halt the system User can enhance performance by adding more processors

Incremental growth

Scaling  Vendors can offer range of products based on number of processors
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Organization Classification
Time shared or common bus  Multiport memory  Central control unit

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

  

Simplest form Structure and interface similar to single processor system Following features provided  Addressing - distinguish modules on bus  Arbitration - any module can be temporary master  Time sharing - if one module has the bus, others must wait and may have to suspend Now have multiple processors as well as multiple I/O modules
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Time Shared Bus

March 20, 2012

Time Share Bus - Advantages
Simplicity  Flexibility  Reliability

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Time Share Bus - Disadvantages
Performance limited by bus cycle time  Each processor should have local cache

Reduce number of bus accesses

Leads to problems with cache coherence

Solved in hardware - see later

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Multiport Memory
Direct independent access of memory modules by each processor  Logic required to resolve conflicts  Little or no modification to processors or modules required

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Multiport Memory - Advantages and Disadvantages
 

More complex

Extra logic in memory system
Each processor has dedicated path to each module

Better performance

Can configure portions of memory as private to one or more processors

Increased security

 Write March 20, 2012

through cache policy
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Central Control Unit
Funnels separate data streams between independent modules  Can buffer requests  Performs arbitration and timing  Pass status and control  Perform cache update alerting  Interfaces to modules remain the same  e.g. IBM S/370

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

A Mainframe SMP IBM z Series Example (1)
 

Uniprocessor with one main memory card to a high-end system with 48 processors and 8 memory cards Dual-core processor chip  Each includes two identical central processors (CPs)  CISC superscalar microprocessor  Mostly hardwired, some vertical microcode  256-kB L1 instruction cache and a 256-kB L1 data cache L2 cache 32 MB  Clusters of five  Each cluster supports eight processors and access to entire main memoryRichard Salomon, Sudipto Mitra space March 20, 2012
Copyright Box Hill Institute

A Mainframe SMP IBM z Series Example (2)

System control element (SCE)
    

 

Main store control (MSC) Memory card

Arbitrates system communication Maintains cache coherence Interconnect L2 caches and main memory Each 32 GB, Maximum 8 , total of 256 GB Interconnect to MSC via synchronous memory interfaces (SMIs) Interface to I/O channels, go directly to L2 cache
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Memory bus adapter (MBA)

March 20, 2012

IBM z990 Multiprocessor Structure

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Shared-Memory Architecture - NUMA

NUMA (Non-Uniform Memory Access)
Each memory is closer to some processors than others.  E.g. “Distributed Shared Memory”.  Typically interconnection is grid or hypercube.  Harder to program, but scales to more processors.

March 20, 2012

(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

NUMA Multiprocessors

A NUMA machine based on two levels of buses
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

The Sun Fire E25K NUMA Multiprocessor (1)

The Sun Microsystems E25K multiprocessor.

March 20, 2012 Education)

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

The Sun Fire E25K NUMA Multiprocessor (2) The SunFire E25K uses a four-level interconnect. Dashed lines are address paths. Solid lines are data paths.

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education) March 20, 2012 Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Shared-Memory Architecture: Cache Coherence
 


Problem - multiple copies of same data may reside in some caches and the main memory Multiple copies have to be kept identical, else it may result in an inconsistent view of memory This results in a cache coherence problem Cache coherence protocols control this problem

(more on this topic in the next class)
March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

SMP, bus interconnection.  4 x 200 MHz Intel Pentium Pro processors.  8 + 8 Kb L1 cache per processor.  512 Kb L2 cache per processor.  Snoopy cache coherence.  Compaq, HP, IBM, NetPower.  Windows NT, Solaris, Linux, etc.

March 20, 2012

Example: Quad-Processor Pentium Pro

Richard Salomon, Sudipto Mitra (CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Copyright Box Hill Institute

Example: SGI Origin 2000
      

NUMA, hypercube interconnection. Up to 128 (64 x 2) MIPS R 10000 processors. 32 + 32 Kb L1 cache per processor. 4 Mb L2 cache per processor. Distributed directory-based cache coherence. Automatic page migration/replication. SGI IRIX with Pthreads.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Message-Passing versus SharedMemory Architectures
    

Shared-memory programming model is easier because data transfer is handled automatically. Message passing can be efficiently implemented on shared memory, but not vice versa. How much of shared-memory programming model should be implemented in hardware? How efficient is shared-memory programming model? How well does shared-memory scale? Does scalablity really matter?
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Summary


 

Flynn’s taxonomy of Parallel Computer architecture (SISD, SIMD, MISD, MIMD) Tightly coupled systems, communicate via a shared memory Loosely coupled systems, are tied together by a switching scheme Mutiprocessor architectures are classified as
 

Message passing can be efficiently implemented on shared memory
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Message-Passing Architectures Shared-Memory Architectures

March 20, 2012

Reference

Stallings William, 2003, Computer Organization & Architecture designing for performance, Sixth Edition, Pearson Education, Inc, ISBN 0 - 13 - 049307 – 4. M Morris Mano, Computer System Architecture, Third Edition, Prentice Hall. Tanenbaum, Structured Computer Organization, Fifth Edition, 2006 Pearson Education, Inc. All rights reserved. 0-13148521-0. CS 284a Lecture, Tuesday, 7 October 1997, John Thornley.
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Manufacturers’ websites  Relevant Special Interest Groups [SIG]  Articles in magazines  IEEE Computer Society Task Force on Cluster Computing web-site

Further Reading

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Sign up to vote on this title
UsefulNot useful