You are on page 1of 20

Instruction-Level Parallelism and

Superscalar Processors
Complex Instruction Set Computer (CISC)
■ There is a trend to richer instruction sets which include a larger and
more complex number of instructions

■ Two principal reasons for this trend:


■ A desire to simplify compilers
■ A desire to improve performance

■ There are two advantages to smaller programs:


■ The program takes up less memory
■ Should improve performance
■ Fewer instructions means fewer instruction bytes to be fetched
■ In a paging environment smaller programs occupy fewer pages, reducing
page faults
■ More instructions fit in cache(s)
Characteristics of Reduced Instruction
Set Architectures

One machine
• Machine cycle --- the time it takes to fetch two operands from registers,
instruction per perform an ALU operation, and store the result in a register
machine cycle

Register-to-register • Only simple LOAD and STORE operations accessing memory


• This simplifies the instruction set and therefore the control unit
operations

Simple addressing • Simplifies the instruction set and the control unit
modes

Simple instruction • Generally only one or a few formats are used


• Instruction length is fixed and aligned on word boundaries
formats • Opcode decoding and register operand accessing can occur simultaneously
+
RISC versus CISC Controversy
■ Quantitative
■ Compare program sizes and execution speeds of programs on RISC and
CISC machines that use comparable technology

■ Qualitative
■ Examine issues of high level language support and use of VLSI real estate

■ Problems with comparisons:


■ No pair of RISC and CISC machines that are comparable in life-cycle cost,
level of technology, gate complexity, sophistication of compiler, operating
system support, etc.
■ No definitive set of test programs exists
■ Difficult to separate hardware effects from complier effects
■ Most comparisons done on “toy” rather than commercial products
■ Most commercial devices advertised as RISC possess a mixture of RISC
and CISC characteristics
Superscalar
Refers to a machine that is
designed to improve the
Term first coined in 1987 performance of the
execution of scalar
Overview instructions

Represents the next step in


In most applications the
the evolution of
bulk of the operations are
high-performance
on scalar quantities
general-purpose processors

Essence of the approach is Concept can be further


the ability to execute exploited by allowing
instructions independently instructions to be executed
and concurrently in in an order different from
different pipelines the program order
Superscalar
Organization
Compared to
Ordinary
Scalar
Organization
Table 16.1
Reported Speedups of
Superscalar-Like Machines
Comparison
of Superscalar
and
Superpipeline
Approaches
Constraints

■ Instruction level parallelism


■ Refers to the degree to which the instructions of a program can be executed
in parallel
■ A combination of compiler based optimization and hardware techniques
can be used to maximize instruction level parallelism

■ Limitations:
■ True data dependency
■ Procedural dependency
■ Resource conflicts
■ Output dependency
■ Antidependency
Effect of
Dependencies
Design Issues
Instruction-Level Parallelism
and Machine Parallelism

■ Instruction level parallelism


■ Instructions in a sequence are independent
■ Execution can be overlapped
■ Governed by data and procedural dependency

■ Machine Parallelism
■ Ability to take advantage of instruction level parallelism
■ Governed by number of parallel pipelines
Instruction Issue Policy
■ Instruction issue
■ Refers to the process of initiating instruction execution in the processor’s functional
units

■ Instruction issue policy


■ Refers to the protocol used to issue instructions
■ Instruction issue occurs when instruction moves from the decode stage of the pipeline to
the first execute stage of the pipeline

■ Three types of orderings are important:


■ The order in which instructions are fetched
■ The order in which instructions are executed
■ The order in which instructions update the contents of register and memory locations

■ Superscalar instruction issue policies can be grouped into the following


categories:
■ In-order issue with in-order completion
■ In-order issue with out-of-order completion
■ Out-of-order issue with out-of-order completion
Superscalar Instruction
Issue and Completion
Policies

output dependency (also called write after write [WAW] dependency)


Organization for Out-of-Order Issue
with Out-of-Order Completion

antidependency (also called write after read [WAR] dependency)


Branch Prediction

■ Any high-performance pipelined machine must address the issue of dealing


with branches

■ Intel 80486 addressed the problem by fetching both the next sequential
instruction after a branch and speculatively fetching the branch target
instruction

■ RISC machines:
■ Delayed branch strategy was explored
■ Processor always executes the single instruction that immediately follows the
branch
■ Keeps the pipeline full while the processor fetches a new instruction stream

■ Superscalar machines:
■ Delayed branch strategy has less appeal
■ Have returned to pre-RISC techniques of branch prediction
Speedups of Various Machine
Organizations Without Procedural
Dependencies
Conceptual Depiction of
Superscalar Processing
Superscalar Implementation

■ Key elements:
■ Instruction fetch strategies that simultaneously fetch multiple instruction
■ Logic for determining true dependencies involving register values, and
mechanisms for communicating these values to where they are needed
during execution
■ Mechanisms for initiating, or issuing, multiple instructions in parallel
■ Resources for parallel execution of multiple instructions, including
multiple pipelined functional units and memory hierarchies capable of
simultaneously servicing multiple memory references
■ Mechanisms for committing the process state in correct order
1. The processor
Pentium 4 Block Diagram fetches instructions
from memory in the
order of the static
program.

2. Each instruction is
translated into one or
more fixed-length
RISC instructions,
known as
micro-operations, or
micro-ops.

3. The processor
executes the
micro-ops on a
superscalar pipeline
organization,
so that the micro-ops
may be executed out
of order.

4. The processor
commits the results of
each micro-op
execution to the
processor’s register set
in the order of the
original program flow.

You might also like