Instruction Level Support Ok

Instruction Level Support for
Parallel Programming
WHY
ILP?
 Ordinary programs are written execute instructions in

sequence; one after the other, in the order as written by
programmers.
 ILP allows the compiler and the processor to overlap the

execution of multiple instructions or even to change the
order in which instructions are executed.
ILP
TECHNIQUES
Micro-architectural techniques that use ILP include:
 Instruction pipelining
 Superscalar
 Out-of-order execution
 Register renaming
 Speculative execution
 Branch prediction
INSTRUCTION PIPELINE
 An instruction pipeline is a technique used

in the design of modern microprocessors,
microcontrollers and CPUs to increase their
instruction throughput (the number of
instructions that can be executed in a unit of
time).
PIPELINING
 The main idea is to divide the processing of a CPU instruction

into a series of independent steps of "microinstructions with
storage at the end of each step.
 This allows the CPUs control logic to handle instructions at the

processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
What is Pipelining?
5
 Like an Automobile Assembly Line for Instructions
 Each step does a little job of processing the

instruction
 Ideally each step operates in parallel
 Simple Model F1 D1 E1
 Instruction Fetch F2 D2 E2
 Instruction Decode F3 D3 E3
 Instruction Execute
pipeline
 It is technique of decomposing a sequential process into

suboperation, with each sub operation completed in
dedicated segment.
 Pipeline is commonly known as an assembly line

operation.
 It is similar like assembly line of car
manufacturing.
 First station in an assembly line set up a chasis, next
station is installing the engine, another group of workers
fitting the body.
Pipeline Stages
We can divide the execution of an instruction

into the following 5 “classic” stages:
IF: Instruction Fetch

ID: Instruction Decode, register fetch
EX: Execution
MEM: Memory Access
WB: Register write Back
Pipeline Stages
 Fetch instruction
 Decode instruction
 Execute instruction
 Access operand
 Write result
 Not
e: variations depending on processor
Slight
Without Pipelining
• Normally, you would perform the fetch, decode,
execute, operate, and write steps of an instruction
and then move on to the next
1 2 3 4 5 6 7 8 9 10
Instr
1
Instr
2
With Pipelining
• The processor is able to perform each stage simultaneously.
• If the processor is decoding an instruction, it may

also fetch another instruction at the same time.
Clock Cycle 2 3 4 5 6 7 8
1 9
Instr 1
Instr 2
Instr 3
Instr 4
EXAMPLE
•  For example, the RISC pipeline is broken into five stages
with a set of flip flops between each stage as follow:
 Instruction fetch
 Instruction decode & register fetch
 Execute
 Memory access
 Register
•  The vertical axis is successive instructions, the horizontal axis

is time. So in the green column, the earliest instruction is in
WB stage, and the latest instruction is undergoing
instruction fetch.
Pipeline Problem
 Problem: An instruction may need to

wait for the result of another
instruction
Pipeline Solution :
 Solution: Compiler may recognize

which instructions are dependent
or independent of the current
instruction, and rearrange them to
run the independent one first
How to make pipelines faster
 Superpipelining

Divide the stages of pipelining into
more stages
 Ex: Split “fetch instruction” stage into two stages

Super scalar pipelining
 Run multiple pipelines in parallel
SUPERSCALER
 A superscalar CPU architecture implements I L P inside a single
processor which allows faster CPU throughput at the same clock
rate.
A superscalar processor executes more than one instruction

during a clock cycle
 Simultaneously dispatches multiple instructions to

multiple redundant functional units built inside the processor.
EXAMPLE
 Simple superscalar pipeline. By fetching and dispatching two
instructions at a time, a maximum of two instructions per
cycle can be completed.
OUT-OF-ORDER
EXECUTION
 OoOE, is a technique used in high

performance microprocessors. most -
 The key concept is to allow the processor to
avoid a class of delays that occur when the
data needed to perform an operation are
unavailable.
 Most modern CPU designs include support for out

of order execution.
STEPS
 Out-of-order processors breaks up the processing of instructions into these
steps:
 Instruction fetch.
 Instruction dispatch to an instruction queue (also called instruction buffer)
 The instruction waits in the queue until its input operands are available.
 The instruction is issued to the appropriate functional unit and executed by

that unit.
 The results are queued (Re-order Buffer).
 Only after all older instructions have their results written back to the register
file, then this result is written back to the register.
OTHER ILP TECHNIQUES
 Register renaming which is a technique used to avoid unnecessary
serialization of program operations caused by the reuse of registers by
those operations, in order to enable out-of-order execution.
 Speculative execution which allow the execution of complete instructions

or parts of instructions before being sure whether this execution is required.
 Branch prediction which is used to avoid delays cause of control

dependencies to be resolved. Branch prediction determines whether a
conditional branch (jump) in the instruction flow of a program is likely to
be taken or not.

Instruction Level Support Ok

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Instruction Level Support Ok

Uploaded by

Copyright:

Available Formats

Instruction Level Support for

 Ordinary programs are written execute instructions in

 ILP allows the compiler and the processor to overlap the

 An instruction pipeline is a technique used

 The main idea is to divide the processing of a CPU instruction

 This allows the CPUs control logic to handle instructions at the

 Like an Automobile Assembly Line for Instructions

 Each step does a little job of processing the

 It is technique of decomposing a sequential process into

 Pipeline is commonly known as an assembly line

We can divide the execution of an instruction

IF: Instruction Fetch

• The processor is able to perform each stage simultaneously.

• If the processor is decoding an instruction, it may

•  The vertical axis is successive instructions, the horizontal axis

 Problem: An instruction may need to

 Solution: Compiler may recognize

 Ex: Split “fetch instruction” stage into two stages

A superscalar processor executes more than one instruction

 Simultaneously dispatches multiple instructions to

 OoOE, is a technique used in high

 Most modern CPU designs include support for out

 Instruction dispatch to an instruction queue (also called instruction buffer)

 The instruction is issued to the appropriate functional unit and executed by

 The results are queued (Re-order Buffer).

 Speculative execution which allow the execution of complete instructions

 Branch prediction which is used to avoid delays cause of control

You might also like