You are on page 1of 13

Computer Architecture

UEC509-Part-6
Dr. Debabrata Ghosh
Assistant Professor, ECED
Thapar University
Parallel processors/multiprocessors
Based on number of instruction streams and data streams that can be processed
simultaneously, computing systems can be classified into four categories (Flynn’s
Classification)

• Single-instruction, single-data (SISD) systems


• Single-instruction, multi-data (SIMD) systems
• Multi-instruction, single-data (MISD) systems
• Multi-instruction, multi-data (MIMD) systems
Flynn’s classification
• SISD: Uniprocessor system. Capable of executing a single instruction, operating on a single data
stream. Traditional uniprocessor PC (before 2010)
• SIMD: Multiprocessor system. Capable of executing the same instruction on all the PUs but operating
on different data streams. Vector processing machines
• MISD: Multiprocessor system. Capable of executing different instructions on different PUs but all of
them operating on the same data stream. Not available commercially
• MIMD: Multiprocessor system. Capable of executing different instructions on different PUs operating
on different data streams. Truly parallel machines
Shared vs distributed-memory architecture for MIMD
Shared-memory MIMD: PUs are connected to a single global memory (either by a single bus
or by a network). All PUs have access to it. Communication between PUs takes place through
the shared memory. Modification of data stored in the global memory by one PU is visible to
all other PUs. Easy to design but less likely to scale (memory contention)

Distributed-memory MIMD: All PUs have a local memory. Communication between PUs
takes place through an interconnection network. Complex design but high scalability

Shared memory modules

Local memory
(not shared)

Shared memory architecture Distributed memory architecture


Shared memory architecture: UMA vs NUMA
• Uniform memory access: Memory access time uniform across all processors. Access time
independent of data location within memory (i.e. access time is same regardless of which
shared memory module contains the data)
• Non uniform memory access: Each processor has its own local memory module that it
can access directly (local access). Local memory of other processors can also be accessed
(remote access) with longer access time.

Shared memory modules

Shared memory modules

• Interconnection network/ Processor to memory network: Butterfly, Benes


Butterfly vs Benes network
• Butterfly network: Blocking network. Some permutations result in link contention
• Benes network: Start with butterfly network. Flip it and repeat this network to the
other side. Non-blocking network. Any permutation can be realized without link
contention

Butterfly network Benes network


Number of levels = log2N +1 Number of levels = 2 log2N +1

Where N = number of rows Where N = number of rows


Cache coherence
• Multiple processor cores share the same memory, but have their own caches
• View of the memory (shared) by two different processors through their individual
caches

• P1 and P2 can have two different values for the same location
• Problem is called multiprocessor cache coherence problem
Cache coherence problem

Write-through cache

• Single memory location (X), read and written by two processors (A and B)
• Initially assume neither cache contains the variable at X, initial value at X is 1
• After variable at X is written by A (new value 0), both A’s cache and memory contain new value, but not B’s
cache
• Two different values for same location (Caches of CPU A and CPU B have 0 and 1 at location X, respectively)
• If B reads the value of the variable at X, it will read 1, not the most recently written 0
• Cache coherence ensures that changes in the values of shared operands are propagated throughout the
system in a timely fashion
Approaches for cache coherence problem

• Software-based approach: Detect the potential code


segments which might cause cache coherence issues and
treat them. Prevent any shared data variables from being
cached. Compile time approach. Inefficient utilization of
cache. 
• Hardware-based approach: Dynamically detect (at run
time) potential cache coherence issues. Efficient use of
cache. Can be of two types: directory protocol, snoopy
protocol
Approaches for cache coherence problem
• Directory-based approach (directory protocol): A single directory is maintained
in MM to keep track of the sharing status of each cache block. Responsibility
of cache coherence on central cache controller. Any local action changing a
cache block, is reported to the central controller. Central controller maintains
(using the global directory) information about which processors have copies of
which cache blocks. Before a processor can write to a local cache block, it must
request exclusive access to the cache block from the central controller
• Snooping-based approach (snoopy protocol): Responsibility of cache coherence
distributed among all the cache controllers. Each cache block is accompanied
by a sharing status. When write operation is performed on a shared cache
block, it is announced to all other caches by broadcasting mechanism. Each
cache controller snoops on the bus to determine if it has that particular block
where write operation is performed and react accordingly
Approaches for cache coherence problem
• Two common snoopy protocol approaches are:
• Write-invalidate snoopy protocol: Processor has exclusive access to a data item before it
writes that item. Invalidate all other cached copies of the data item when that data item is
written

Write-back cache

• Initially assume neither cache contains the data item at X. Initial value at X is 0
• When B wants to read X, A responds with the written value (1) cancelling the response from
memory (0)
• Content of B’s cache and memory content at X are updated same time: write-back cache
Cache coherence approaches
• Write-update (write broadcast) snoopy protocol: Update all the cached copies of
the data item when it is written

• Initially assume neither cache contains the data item at X. Initial value at X is 0
• When A broadcasts a write, both B’s cache and memory location X are
updated
Additional Topics
• Write-invalidate, cache-coherence protocol (MSI protocol) for write-back cache
(P. 664 of Hennesey)
• Universal Serial Bus
• Direct Memory Access (DMA)
• Daisy Chain

You might also like