You are on page 1of 20

Lecture 6

Data-Parallel architectures
&
SIMD

CH03
Data-parallel computation (bit
parallel)
Data Parallelism
(A strength for SIMDs)
 All tasks (or processors) apply the same set of
operations to different data.
 Example:
for i ← 0 to 99 do
a[i] ← b[i] + c[i]
endfor

 Accomplished on SIMDs by having all active processors


execute the operations synchronously
Application of Data-parallel Architectures:
One data entity processed by one PE
Mapping Problem space into Architectural Space:
Data entity onto PE (1-to-1 mapping)
Near-neighbor connectivity (2-
D: Mesh)
Tree: 2-D hierarchy
Pyramid: 3-D hierarchy
Hypercube: 2^N nodes in N dimension
Hypercube: 4-D
Long and short-range connections
Data-parallel approaches
The SIMD Computer & Model
Consists of two types of processors:
 A front-end or control unit
 Stores a copy of the program
 Has a program control unit to execute program
 Broadcasts parallel program instructions to the array
of processors.
 Array of processors of simplistic processors that
are functionally more like an ALU.
 Does not store a copy of the program nor have a
program control unit.
 Executes the commands in parallel sent by the front
end.
SIMD (cont.)
 On a memory access, all active
processors must access the same location
in their local memory.
 All active processor executes the same
instruction synchronously, but on different
data
 The sequence of different data items is
often referred to as a vector.
Possible Architecture for a
Generic SIMD
Elements of Success of SIMD
 Simplicity of Concept and Programming
 Regularity of Structure
 Easy Scalability of Size and Performance
 Straightforward Applicability to fields
Design Space
 The Complexity of PE
 Degree to which each Processing Element is
allowed local autonomy
 Types of Connectivity
 Number and Connection Topology
 Disposition and Method of Backup Memory
 Implementation Technology
Granularity
 Fine-Grain
 One data elements maps to each PE
 Partial Fine-Grain
A few data elements map to each PE
 Coarse-Grain
 Many data elements map to each PE
Connectivity
 Nearest Neighbor
 High Diameter (N / 2) High Bandwidth (N)
 Tree
 Low Diameter (2 * k - 2 ) and Bandwidth (log N)
 Pyramid
 Low Diameter (as tree) but Complex Programming
 Hypercube
 Low Diameter (log N) and High Bandwidth ((N log N) /
2)
Processor Complexity
 Single-Bit
 Fine-Grain and usually used for Image
Processing
 Integer
A compromise. Usually used for vision or
general computing
 Floating-Point
 Coarse grain and used for scientific
computing
Local Autonomy
 No Local Control
 Local Activity Control
 Local Data Control
 Local Connectivity Control
 Local Function Control
 Local Algorithm Control
 Local Sequencing Control
 Local Partitioning Control

You might also like