3

Instruction-Level Parallelism and
Superscalar Processors
Complex Instruction Set Computer (CISC)
■ There is a trend to richer instruction sets which include a larger and
more complex number of instructions
■ Two principal reasons for this trend:

■ A desire to simplify compilers
■ A desire to improve performance
■ There are two advantages to smaller programs:

■ The program takes up less memory
■ Should improve performance
■ Fewer instructions means fewer instruction bytes to be fetched
■ In a paging environment smaller programs occupy fewer pages, reducing
page faults
■ More instructions fit in cache(s)
Characteristics of Reduced Instruction
Set Architectures
One machine
• Machine cycle --- the time it takes to fetch two operands from registers,
instruction per perform an ALU operation, and store the result in a register
machine cycle
Register-to-register • Only simple LOAD and STORE operations accessing memory

• This simplifies the instruction set and therefore the control unit
operations
Simple addressing • Simplifies the instruction set and the control unit
modes
Simple instruction • Generally only one or a few formats are used

• Instruction length is fixed and aligned on word boundaries
formats • Opcode decoding and register operand accessing can occur simultaneously
+
RISC versus CISC Controversy
■ Quantitative
■ Compare program sizes and execution speeds of programs on RISC and
CISC machines that use comparable technology
■ Qualitative
■ Examine issues of high level language support and use of VLSI real estate
■ Problems with comparisons:

■ No pair of RISC and CISC machines that are comparable in life-cycle cost,
level of technology, gate complexity, sophistication of compiler, operating
system support, etc.
■ No definitive set of test programs exists
■ Difficult to separate hardware effects from complier effects
■ Most comparisons done on “toy” rather than commercial products
■ Most commercial devices advertised as RISC possess a mixture of RISC
and CISC characteristics
Superscalar
Refers to a machine that is
designed to improve the
Term first coined in 1987 performance of the
execution of scalar
Overview instructions
Represents the next step in

In most applications the
the evolution of
bulk of the operations are
high-performance
on scalar quantities
general-purpose processors
Essence of the approach is Concept can be further

the ability to execute exploited by allowing
instructions independently instructions to be executed
and concurrently in in an order different from
different pipelines the program order
Superscalar
Organization
Compared to
Ordinary
Scalar
Organization
Table 16.1
Reported Speedups of
Superscalar-Like Machines
Comparison
of Superscalar
and
Superpipeline
Approaches
Constraints
■ Instruction level parallelism

■ Refers to the degree to which the instructions of a program can be executed
in parallel
■ A combination of compiler based optimization and hardware techniques
can be used to maximize instruction level parallelism
■ Limitations:
■ True data dependency
■ Procedural dependency
■ Resource conflicts
■ Output dependency
■ Antidependency
Effect of
Dependencies
Design Issues
Instruction-Level Parallelism
and Machine Parallelism
■ Instruction level parallelism

■ Instructions in a sequence are independent
■ Execution can be overlapped
■ Governed by data and procedural dependency
■ Machine Parallelism
■ Ability to take advantage of instruction level parallelism
■ Governed by number of parallel pipelines
Instruction Issue Policy
■ Instruction issue
■ Refers to the process of initiating instruction execution in the processor’s functional
units
■ Instruction issue policy

■ Refers to the protocol used to issue instructions
■ Instruction issue occurs when instruction moves from the decode stage of the pipeline to
the first execute stage of the pipeline
■ Three types of orderings are important:

■ The order in which instructions are fetched
■ The order in which instructions are executed
■ The order in which instructions update the contents of register and memory locations
■ Superscalar instruction issue policies can be grouped into the following

categories:
■ In-order issue with in-order completion
■ In-order issue with out-of-order completion
■ Out-of-order issue with out-of-order completion
Superscalar Instruction
Issue and Completion
Policies
output dependency (also called write after write [WAW] dependency)

Organization for Out-of-Order Issue
with Out-of-Order Completion
antidependency (also called write after read [WAR] dependency)

Branch Prediction
■ Any high-performance pipelined machine must address the issue of dealing

with branches
■ Intel 80486 addressed the problem by fetching both the next sequential
instruction after a branch and speculatively fetching the branch target
instruction
■ RISC machines:
■ Delayed branch strategy was explored
■ Processor always executes the single instruction that immediately follows the
branch
■ Keeps the pipeline full while the processor fetches a new instruction stream
■ Superscalar machines:
■ Delayed branch strategy has less appeal
■ Have returned to pre-RISC techniques of branch prediction
Speedups of Various Machine
Organizations Without Procedural
Dependencies
Conceptual Depiction of
Superscalar Processing
Superscalar Implementation
■ Key elements:
■ Instruction fetch strategies that simultaneously fetch multiple instruction
■ Logic for determining true dependencies involving register values, and
mechanisms for communicating these values to where they are needed
during execution
■ Mechanisms for initiating, or issuing, multiple instructions in parallel
■ Resources for parallel execution of multiple instructions, including
multiple pipelined functional units and memory hierarchies capable of
simultaneously servicing multiple memory references
■ Mechanisms for committing the process state in correct order
1. The processor
Pentium 4 Block Diagram fetches instructions
from memory in the
order of the static
program.
2. Each instruction is
translated into one or
more fixed-length
RISC instructions,
known as
micro-operations, or
micro-ops.
3. The processor
executes the
micro-ops on a
superscalar pipeline
organization,
so that the micro-ops
may be executed out
of order.
4. The processor
commits the results of
each micro-op
execution to the
processor’s register set
in the order of the
original program flow.

3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3

Uploaded by

Copyright:

Available Formats

Instruction-Level Parallelism and

■ Two principal reasons for this trend:

■ There are two advantages to smaller programs:

Register-to-register • Only simple LOAD and STORE operations accessing memory

Simple instruction • Generally only one or a few formats are used

■ Problems with comparisons:

Represents the next step in

Essence of the approach is Concept can be further

■ Instruction level parallelism

■ Instruction level parallelism

■ Instruction issue policy

■ Three types of orderings are important:

■ Superscalar instruction issue policies can be grouped into the following

output dependency (also called write after write [WAW] dependency)

antidependency (also called write after read [WAR] dependency)

■ Any high-performance pipelined machine must address the issue of dealing

You might also like