You are on page 1of 36

Unit-2 : DLP in VECTOR, SIMD AND

GPU ARCHITECTUREs
Vector architecture
SIMD instruction set extensions for multimedia
Graphics Processing Units
Detecting and Enhancing Loop Level Parallelism
Case studies.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

Vector architecture

Basic idea:

Read sets of data elements into vector registers


Operate on those registers
Disperse the results back into memory

Registers are controlled by compiler

Used to hide memory latency


Leverage memory bandwidth

Each core in a heterogeneous multi-core processing


unit can be designed in order to utilize different
architectures such as Superscalar, VLIW, Vector
processing, SIMD and multithreading
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

Vector architecture
Intel has always been the benchmark for
computational power while AMD had the last word
in terms of graphics and gaming. These
companies have realized lately that they cannot
afford to stick on to their boundary of capability
and serve the needs of a limited group of people.
The need of the hour is a solution which provides
a balanced and commendable performance for
both computation as well as graphics intensive
applications and games.

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

Vector Architectures
In 7080s, Supercomputer Vector machine
Definition of supercomputer
Fastest machine in the world at given task
A device to turn a computebound problem into an
I/Obound problem
CDC6600 (Cray, 1964) is regarded as the first
supercomputer
Vector supercomputers (epitomized by Cray1, 1976)
Scalar unit + vector extensions
Vector registers, vector instructions
Vector loads/stores
Highly pipelined functional units

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

SIMD instruction set extensions for


multimedia
SIMD architectures can exploit significant
data-level parallelism for:
matrix-oriented scientific computing
media-oriented
image
and
sound
processors
SIMD is more energy efficient than MIMD
Only needs to fetch one instruction per
data operation
Makes SIMD attractive for personal mobile
devices
SIMD allows programmer to continue to think
sequentially
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

10

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

11

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

12

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

13

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

14

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

15

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

16

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

17

Graphics Processing Units

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

18

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

19

Processor manufacturers are constantly


challenged to build better, faster and more
stable processor architectures and designs

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

20

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

21

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

22

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

23

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

24

Considering this requirement, AMD


acquired ATI, which was the manufacturers of
their graphics processing units in the past.
This move was made by AMD in order to
accelerate andalignthe development of their
A series processors and FX technology. The A
series processors are highly capable Quad
core processing units. AMD has created a new
terminology called APU or Accelerated
processing units in which they combine multi
core processing units and the graphics
processing
units using an accelerator
IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

25

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

26

Detecting and Enhancing Loop Level


Parallelism

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

27

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

28

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

29

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

30

Case studies

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

31

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

32

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

33

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

34

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

35

IFETCE/ME/CSE/B.V.R.Raju/Iyear/Isem/CP7103/MCA/Unit-2/PPt/Ver1.0

36

You might also like