Professional Documents
Culture Documents
Superscalar machines
Multicore Machines
Clusters
Parallel Processors
Hardware implementation vs microprogramming
Chapter 14
Superscalar Processors
Definition of Superscalar
Design Issues:
- Instruction Issue Policy
- Register renaming
- Machine parallelism
- Branch Prediction
- Execution
Pentium 4 example
What is Superscalar?
A Superscalar machine executes multiple independent
instructions in parallel.
They are pipelined as well.
Superscalar v Superpipelined
Limitations of Superscalar
Dependent upon:
- Instruction level parallelism possible
- Compiler based optimization
- Hardware support
Limited by
Data dependency
Procedural dependency
Resource conflicts
r1+r2 r1
r1 r3
LOAD
r1, X
x (memory) r1
MOVE
r3, r1
r1 r3
R4, R3, 1
R3 + 1 R4
ADD
R3, R5, 1
R5 + 1 R3
Notes:
R1, R2
R3, 1
R4, R2
Consider:
ADD
R3, 1
ADD
R4, R3
STO (R4), R0
These cannot be handled in parallel.
Standard Categories:
In-order issue with in-order completion
In-order issue with out-of-order completion
Out-of order issue with out-of-order completion
Again:
I1 requires 2 cycles to execute
I3 & I4 conflict for the same functional unit
I5 depends upon value produced by I4
I5 & I6 conflict for a functional unit
Again:
I1 requires 2 cycles to execute
I3 & I4 conflict for the same functional unit
I5 depends upon value produced by I4
I5 & I6 conflict for a functional unit
Register Renaming
to avoid hazards
R3b:=R3a + R5a
R4b:=R3b + 1
R3c:=R5a + 1
R7b:=R3c + R4b
(I1)
(I2)
(I3)
(I4)
Duplication of Resources
Out of order issue hardware
Windowing to decouple execution from decode
Register Renaming capability
Need instruction window large enough (more than 8, probably not more than 32)
Example: Pentium 4
A Superscalar CISC Machine
Pentium 4 pipeline
20 stages !
d) Drive (stage 5)
- Each micro-op is allocated to a slot in the 126 position circular Reorder Buffer (ROB) which
tracks progress of the micro-ops.
Buffer entries include:
- State scheduled, dispatched, completed, ready for retire
- Address that generated the micro-op
- Operation
- Alias registers are assigned for one of 16 arch reg (128 alias registers)
{to remove data
dependencies}
g) Micro-op scheduling
(stages 10, 11, & 12)
h) Dispatch
(stages 13 & 14)
i) Register file
(stages 15 & 16)
j) Execute: flags
(stages 17 & 18)
The register files are the sources for pending fixed and FF
operations
A separate stage is used to compute the flags
k) Branch check
(stage 19)