You are on page 1of 13

Data Dependences

• True data dependence = > RAW


• Antidependence = > WAR
Name Dependence
• Output dependence = > WAW
• Control Dependences
Multiple Issue Architectures
Book 1 – Computer Architecture: A Quantitative Approach, Henessy and Patterson,
5th Edition, Morgan Kaufmann, 2012
Chapter Three : Instruction-Level Parallelism and Its Exploitation
Multiple-issue processors come in three major flavors:
1. Statically scheduled superscalar processors
2. VLIW (very long instruction word) processors
3. Dynamically scheduled superscalar processors
Issue : Static vs. Dynamic
• Static
➢fixed number of instructions issued per clock cycle
• Dynamic
➢Variable number of instructions issued per clock cycle
• Superscalar processors are dynamic issue
• VLIW processors are static issue
Scheduling – Static vs. Dynamic
• Static scheduling (optimized by compiler)
– When there is a stall (hazard) no further issue of instructions
– Of course, the stall has to be enforced by the hardware
• Dynamic scheduling (enforced by hardware)
– Instructions following the one that stalls can issue if they do not produce
structural hazards or dependencies
– implies possibility of out-of-order execution
• VLIW processors are statically scheduled
• Superscalar processors can be both
The five primary approaches in use for multiple-issue processors
Superscalar processors issue varying numbers of instructions per clock
There are two types of superscalar processors which use:
1. in-order execution if they are statically scheduled
2. out-of order execution if they are dynamically scheduled
VLIW processors issue a fixed number of instructions formatted either as
one large instruction or as a fixed instruction packet with the parallelism
among instructions explicitly indicated by the instruction
VLIW processors are inherently statically scheduled by the compiler
• Although statically scheduled superscalars issue a varying rather than a
fixed number of instructions per clock, they are actually closer in
concept to VLIWs, since both approaches rely on the compiler to
schedule code for the processor
• Because of the diminishing advantages of a statically scheduled
superscalar as the issue width grows, statically scheduled superscalars
are used primarily for narrow issue widths, normally just two
instructions.
• Beyond that width, most designers choose to implement either a VLIW
or a dynamically scheduled superscalar.
The Basic VLIW Approach
• VLIWs use multiple, independent functional units
• It packages the multiple operations into one very long instruction
• Its like multiple operations are placed in one instruction
• Let’s consider a VLIW processor with instructions that contain five
operations, including one integer operation (which could also be a branch),
two floating-point operations, and two memory references
• The instruction would have a set of fields for each functional unit (16 to 24
bits per unit) yielding an instruction length of between 80 and 120 bits
• To keep the functional units busy, there must be enough parallelism in a
code sequence to fill the available operation slots
• This parallelism is uncovered by unrolling loops and scheduling the code
within the single larger loop body
Example of scheduling
Show how the following loop would look on MIPS, both scheduled
and unscheduled, including any stalls or idle clock cycles. Schedule
for delays from floating-point operations but remember that we are
ignoring delayed branches.
Assuming the following pipeline latencies:

• The latency of a floating-point load to a store is 0, since the result of the


load can be bypassed without stalling the store
• An integer load latency is of 1 cycle
• An integer ALU operation latency is 0 cycles
Without any scheduling, the loop We can schedule the loop to obtain
will execute as follows, taking nine only two stalls and reduce the time to
cycles: seven cycles:

You might also like