Professional Documents
Culture Documents
Chapter -3
2. Refactor pipeline.
• With dynamic scheduling the hardware tries to rearrange the instructions during
run-time to reduce pipeline stalls and they all use Out-of-Order Execution.
• Simpler than compiler
• Handles dependencies not known at compile time
• Allows code compiled for a different machine to run efficiently.
Out-of-Order Execution
• In introducing out-of-order execution, we have essentially split the ID pipeline
stage into two stages:
• An instruction fetch proceeds with the issue stage and may fetch this
instruction either into a single-entry latch or a queue. A design of this type
may use an instruction queue to hold instructions that have been fetched but
are waiting to be executed.
• instructions are then issued from the latch or queue.
• The EX stage follows the read operands stage,
• Thus, we may need to distinguish when an instruction begins execution and
when it completes execution; between the two times, the instruction is in
execution.
• This allows multiple instructions to be in execution at the same time.
Branch Handling Techniques
• One of the major problem in instruction pipelining is the occurrence of Branch
instruction.
• A branch is an instruction in a computer program that can cause computer to
begin execution of different instruction sequence and thus deviate from its
default behavior of executing instructions in order.
• A branch instruction can be Conditional or Unconditional.
• An Unconditional branch, always alters the sequential program flow by loading
the program counter with the target address.
• In Conditional branch, the control selects the target instruction, if condition
is satisfied or the next sequential instruction is selected if condition is not
satisfied.
• The branch instruction breaks the normal sequence of instruction stream,
causing difficulties in the operation of the instruction pipeline.
• Pipelined computers employs various hardware techniques to minimize the
performance degradation caused by the instruction branching.
Branch Handling Techniques
1. Prefetch Target Instruction:
• One way of handling a conditional branch is to prefetch the target instruction in addition to
the instruction following the branch.
• If the branch condition is successfully satisfied, the pipeline continues from the branch
target instruction.
4. Delayed Branch
• Whenever a particular statement is encountered with, high probability of
branching and the statement at branch target address has a high probability of
branching (if any of these condition is satisfied), then the instruction following
the current statement are stopped from appearing the pipeline and the instruction
whose address is found in branch target address, is allowed to enter the pipeline
causing less of branch penalty.
Instruction Level Parallelism
• Instruction-level parallelism (ILP) is a measure of how many of
the instructions in a computer program can be executed simultaneously.
• Pipelining can overlap the execution of instructions when they are independent
of one another. This potential overlap among instructions is called instruction-
level parallelism (ILP) since the instructions can be evaluated in parallel.
• ILP must not be confused with concurrency, since the ILP is about parallel
execution of a sequence of instructions belonging to a specific thread/
a process, Conversely, concurrency regards with the threads of one or different
processes being assigned to a CPU's core in a strict alternance or in true
parallelism if there are enough CPU's cores, ideally one core for each runnable
thread.
• There are two approaches to instruction level parallelism:
Hardware and Software.
Instruction Level Parallelism contd…
• Hardware level works upon dynamic parallelism, whereas the software level
works on static parallelism.
• Dynamic parallelism means the processor decides at run time which instructions
to execute in parallel, whereas static parallelism means the compiler decides
which instructions to execute in parallel.
• Consider the following program:
1. e = a + b
2. f = c + d
3. m = e * f
• Operation 3 depends on the results of operations 1 and 2, so it cannot be
calculated until both of them are completed.
• However, operations 1 and 2 do not depend on any other operation, so they can
be calculated simultaneously.
• If we assume that each operation can be completed in one unit of time then these
three instructions can be completed in a total of two units of time, giving an ILP
of 3/2.
Instruction Level Parallelism contd…
• A goal of compiler and processor designers is to identify and take advantage of as much
ILP as possible.
• Ordinary programs are typically written under a sequential execution model where
instructions execute one after the other and in the order specified by the programmer.
• ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
• How much ILP exists in programs is application specific. In certain fields, such as
graphics and scientific computing the amount can be very large.
• However, workloads such as cryptography may exhibit much less parallelism.
• Now the super scaler pipeline, decode and issue more than one instruction at a time and
reduce the steady state CPI to less than 1.
• Each unit of superscalar processor has its own Fetch, Decode and Store Unit. Execution
unit may be common or duplicate according to the complexity of computation.
Superscalar Pipeline Processor
1. In-Order Issue.
2. Out-of-Order issue.