You are on page 1of 38

Instruction Level Support for

Parallel Programming
WHY
ILP?

 Ordinary programs are written execute instructions in


sequence; one after the other, in the order as written by
programmers.

 ILP allows the compiler and the processor to overlap the


execution of multiple instructions or even to change the
order in which instructions are executed.
ILP
TECHNIQUES
Micro-architectural techniques that use ILP include:

 Instruction pipelining

 Superscalar

 Out-of-order execution

 Register renaming

 Speculative execution

 Branch prediction
INSTRUCTION PIPELINE

 An instruction pipeline is a technique used


in the design of modern microprocessors,
microcontrollers and CPUs to increase their
instruction throughput (the number of
instructions that can be executed in a unit of
time).
PIPELINING

 The main idea is to divide the processing of a CPU instruction


into a series of independent steps of "microinstructions with
storage at the end of each step.

 This allows the CPUs control logic to handle instructions at the


processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
What is Pipelining?
5

 Like an Automobile Assembly Line for Instructions

 Each step does a little job of processing the


instruction
 Ideally each step operates in parallel

 Simple Model F1 D1 E1
 Instruction Fetch F2 D2 E2
 Instruction Decode F3 D3 E3
 Instruction Execute
pipeline

 It is technique of decomposing a sequential process into


suboperation, with each sub operation completed in
dedicated segment.

 Pipeline is commonly known as an assembly line


operation.
 It is similar like assembly line of car
manufacturing.
 First station in an assembly line set up a chasis, next
station is installing the engine, another group of workers
fitting the body.
Pipeline Stages

We can divide the execution of an instruction


into the following 5 “classic” stages:

IF: Instruction Fetch


ID: Instruction Decode, register fetch
EX: Execution
MEM: Memory Access
WB: Register write Back
Pipeline Stages
 Fetch instruction
 Decode instruction
 Execute instruction
 Access operand
 Write result

 Not
e: variations depending on processor
Slight
Without Pipelining
• Normally, you would perform the fetch, decode,
execute, operate, and write steps of an instruction
and then move on to the next

1 2 3 4 5 6 7 8 9 10

Instr
1

Instr
2
With Pipelining

• The processor is able to perform each stage simultaneously.

• If the processor is decoding an instruction, it may


also fetch another instruction at the same time.
Clock Cycle 2 3 4 5 6 7 8
1 9

Instr 1
Instr 2
Instr 3
Instr 4
EXAMPLE
•  For example, the RISC pipeline is broken into five stages
with a set of flip flops between each stage as follow:
 Instruction fetch
 Instruction decode & register fetch
 Execute
 Memory access
 Register

•  The vertical axis is successive instructions, the horizontal axis


is time. So in the green column, the earliest instruction is in
WB stage, and the latest instruction is undergoing
instruction fetch.
Pipeline Problem

 Problem: An instruction may need to


wait for the result of another
instruction
Pipeline Solution :

 Solution: Compiler may recognize


which instructions are dependent
or independent of the current
instruction, and rearrange them to
run the independent one first
How to make pipelines faster
 Superpipelining

Divide the stages of pipelining into
more stages

 Ex: Split “fetch instruction” stage into two stages


Super scalar pipelining
 Run multiple pipelines in parallel
SUPERSCALER
 A superscalar CPU architecture implements I L P inside a single
processor which allows faster CPU throughput at the same clock
rate.

A superscalar processor executes more than one instruction


during a clock cycle

 Simultaneously dispatches multiple instructions to


multiple redundant functional units built inside the processor.
EXAMPLE
 Simple superscalar pipeline. By fetching and dispatching two
instructions at a time, a maximum of two instructions per
cycle can be completed.
OUT-OF-ORDER
EXECUTION

 OoOE, is a technique used in high


performance microprocessors. most -
 The key concept is to allow the processor to
avoid a class of delays that occur when the
data needed to perform an operation are
unavailable.

 Most modern CPU designs include support for out


of order execution.
STEPS
 Out-of-order processors breaks up the processing of instructions into these
steps:

 Instruction fetch.

 Instruction dispatch to an instruction queue (also called instruction buffer)

 The instruction waits in the queue until its input operands are available.

 The instruction is issued to the appropriate functional unit and executed by


that unit.

 The results are queued (Re-order Buffer).

 Only after all older instructions have their results written back to the register
file, then this result is written back to the register.
OTHER ILP TECHNIQUES
 Register renaming which is a technique used to avoid unnecessary
serialization of program operations caused by the reuse of registers by
those operations, in order to enable out-of-order execution.

 Speculative execution which allow the execution of complete instructions


or parts of instructions before being sure whether this execution is required.

 Branch prediction which is used to avoid delays cause of control


dependencies to be resolved. Branch prediction determines whether a
conditional branch (jump) in the instruction flow of a program is likely to
be taken or not.

You might also like