You are on page 1of 15

Module II Parallel Architectures

Introduction to OpenMP Programming – Instruction Level


Support for Parallel Programming – SIMD – Vector Processing –
GPUs
Instruction Level Parallelism
Instruction Level Parallelism
• Instruction-level parallelism, or ILP, attempts to improve processor performance by
having multiple processor components or functional units simultaneously executing
instructions

• Two approaches to ILP:


• Hardware level – dynamic parallelism
• Software level – Static parallelism

• Dynamic parallelism means the processor decides at run time which instructions to execute in
parallel, whereas static parallelism means the compiler decides which instructions to execute in
parallel.
• The Pentium processor works on the dynamic sequence of parallel execution, but
the Itanium processor works on the static level parallelism.
Instruction Level Parallelism

Example:
I1: e = a + b
I2: f = c + d
I3: m = e * f

ILP = 3/2
Instruction Level Parallelism
• There are two main approaches to ILP:
• Pipelining, in which functional units are arranged in stages, and
• Multiple issue, in which multiple instructions can be simultaneously initiated.

• Both approaches are used in virtually all modern CPUs.

• Pipelining can overlap the execution of instructions when they are independent of one
another.
• This potential overlap among instructions is called Instruction-Level Parallelism (ILP) since
the instructions can be evaluated in parallel
Instruction Level Parallelism
• Pipelining attempts to keep every part of the processor busy with some instruction by dividing
incoming instructions into a series of sequential steps performed by different processor units
with different parts of instructions processed in parallel.

• ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
• How much ILP exists in programs is very application specific.

• Can suffer from Hazards


• Stalling, Delay processing of instructions, introduction of additional paths can be a solution for
hazard
Instruction Level Parallelism - Pipelining
Instruction Level Parallelism - Pipelining
• Add the two floating point numbers: 9.87 × 104 and 6.54 × 103. Then we can use the following
steps:
Instruction Level Parallelism - Pipelining
if we execute the code

float x[1000], y[1000], z[1000];


for (i = 0; i < 1000; i++)
z[i] = x[i] + y[i];

• the for loop will take something like 7000 nanoseconds


• suppose we divide our floating point adder into seven separate pieces of hardware or functional
units.
• total time to execute the for loop has been reduced from 7000 nanoseconds to 1006
nanoseconds—an improvement of almost a factor of seven.
Table: Pipelined Addition
Multiple Issue
• Pipelines improve performance by taking individual pieces of hardware or functional units
and connecting them in sequence.
• Multiple issue processors replicate functional units and try to simultaneously execute
different instructions in a program. For example,
for (i = 0; i < 1000; i++)
z[i] = x[i] + y[i];

• While the first adder is computing z[0], the second can compute z[1]; and so on
• If the functional units are scheduled at compile time, the multiple issue system is said to use
static multiple issue.
• If they're scheduled at run-time, the system is said to use dynamic multiple issue.
• A processor that supports dynamic multiple issue is sometimes said to be superscalar
processor.
Multiple Issue
• In order to make use of multiple issue, the system must find instructions that can be
executed simultaneously. One of the most important techniques is speculation.
• In speculation, the compiler or the processor makes a guess about an instruction, and then
executes the instruction on the basis of the guess.
• As a simple example, in the following code, the system might predict that the outcome of z =
x + y will give z a positive value, and, as a consequence, it will assign w = x.
z = x + y;
if (z > 0) w = x;
else w = y;
• However, speculative execution must allow for the possibility that the predicted behavior is
incorrect
Multiple Issue
• If the compiler does the speculation, it will usually insert code that tests whether the
speculation was correct, and, if not, takes corrective action.

• If the hardware does the speculation, the processor usually stores the result(s) of the
speculative execution in a buffer.

• When it's known that the speculation was correct, the contents of the buffer are transferred to
registers or memory.

• If the speculation was incorrect, the contents of the buffer are discarded and the instruction is
re-executed.
Example: Superscalar execution
• Consider a processor with two
pipelines and the ability to
simultaneously issue two
instructions. These processors
are sometimes also referred to
as super-pipelined processors.
• The ability of a processor to
issue multiple instructions in the
same cycle is referred to as
superscalar execution.

Issues to be solved
• Data dependency
• Resource dependency
• Branch dependencies
References
• https://www.sciencedirect.com/topics/computer-science/instruction-level-parallelism
• http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/instrLevParal.html

You might also like