Professional Documents
Culture Documents
1 2
D-cache
3 4
Superscalar Superskalare Prozessoren
5 6
Second-level cache
" Fetch and decode instructions at a higher bandwidth than execute them. (or memory)
" Instruction window is kept full than the deeper instruction look-ahead
allows to find more instructions to issue to the execution units. Typically 128 bits/cycle
" The processor fetches and decodes more (today about 1.4 to twice as
many) instructions than it commits, because it discards instructions on Predecode
unit
mispredicted branch paths.
When instructions are written into the Icache,
" Typically the decode bandwidth is the same as the instruction fetch E . g . 1 4 8 b i t s / c y c lteh e p r e d e c o 1d e u n i t a p p e n d s 4 - 7 b i t s t o e a c h
bandwidth. RISC instruction
" Multiple instruction fetch and decode is supported by a fixed instruction Icache
length.
1
In the AMD K5, which is an x86-compatible CISC-processor,
the predecode unit appends 5 bits to each byte
9 10
13 14
Register Renaming
Techniques to implement renaming:
ad r2, ..., ...,
Architectural Physical
" Only a single set of registers is provided and architectural registers are
reg.numbers Entry RB- reg.numbers dynamically mapped to physical registers.
valid index – The physical registers contain committed values and temporary
0 1 10 0 results.
1 1 11 1
r2: 2 1 3 2 – After commitment of an instruction, the physical register is made
3 3
Allocated
0 p3 permanent and no copying is necessary.
4 1 0 4 tor2
5 0
Mapping " Alternative to the dynamic renaming is the use of a large register file as
table Physicalregisterfile defined for the Intel Merced.
31 0
39
15 16
Type of rename buffers
(The basic approach of how rename buffers are implemented)
Issue and Dispatch
17 18
Instruction scheduling
Issue
Static scheduling Dynamic scheduling
Instruction in
execution
" The program order of the issued instructions is stored in
the reorder buffer.
Dependence free Dependence free code
and optimized code
f o r a s u p e r -f o r a V L I W f o r a p i p e l if no er da s u p e r s c a l a r
scalar or processor processor
pipelined processor
19 20
Issue order
Issue
" Instruction issue from the instruction buffer can be:
In-order issue Out-of-order issue
– in-order (only in program order) or out-of-order
– it can be subject to simultaneous data dependences and Issue window Issue window
Instructions Instructions
resource (structural) constraints, t o b e i s s u e de d c b a t o b e i s s u e de d c b a
Instructions Instructions
issued a issued c a
– or it can be divided in two (or more) stages
" checking structural conflict in the first and data Instructions are issued Instructions may be issued
strictly in program order out-of-order
dependences in the next stage (or vice versa).
Most superscalar M C 8 8 1 1 0 ( 1 9 9( p1 a) r t i a l l y )
" In the case of structural conflicts, the instructions are processors P P C 6 0 1 ( 1 9 (9p3a) r t i a l l y )
Dispatch
" An instruction is then said to be dispatched from a reservation station to
Instruction Window
the FU when all operands are available, and execution starts.
" If all its operands are available during issue and the FU is not busy, an
instruction is immediately dispatched starting execution in the next cycle " The notion of the instruction window comprises all the waiting stations
after the issue. between decode (rename) and execute stages.
" So, the dispatch is usually not a pipeline stage. 2
" An issued instruction may stay in the reservation station for zero to " The instruction window isolates the decode/rename from the execution
several cycles. stages of the pipeline.
" Dispatch and execution is performed out of program order
"
micro- dataflow
23 24
Instruction Window Organizations Multiple Instruction Issue
Icache
EU EU EU
25 26
Issue Schemes
Instruction Window Organizations
" Single-level, central issue
" single-level issue out of a central window as in " decoupling of instruction windows: Each instruction window is
Pentium II processor shared by a group of (usually related) functional units,
Issue most common: separate floating-point window and integer window
and
Dispatch
" combination of multi-stage issue and decoupling of instruction
windows:
Decode Functional – In a two-stage issue scheme with resource dependent issue
and Units preceding the data-dependent dispatch,
Rename
the first stage is done in-order,
the second stage is performed out-of-order.
27 28
Single-Level, Two-Window Issue Two-Level Issue with Multiple Windows
" Single-level, two-window issue: single-level issue with a " Two-level issue with multiple windows with a centralized
instruction window decoupling using two separate windows window in the first stage and separate windows in the second
– most common: separate floating point and integer windows as stage (PowerPC 604 and 620 processors).
in HP 8000 processor
Issue Dispatch
Issue
and Functional Unit
Dispatch
Functional Decode Functional Unit
Decode Units and
and Rename Functional Unit
Rename Functional Functional Unit
Units
Reservation Stations
29 30
Execution Stages
Execution Stages
" Multi-cycle units perform more complex operations that cannot be
" Various types of FUs classified as: implemented within a single cycle.
– single-cycle (latency of one) or
– multiple-cycle (latency more than one) units. " Multi-cycle units
– can be pipelined to accept a new operation each cycle or each other
" Single-cycle units produce a result one cycle after an instruction cycle
started execution. – or they are non-pipelined.
" Usually they are also able to accept a new instruction each cycle
(throughput of one). " Another class of units exists that perform the operations with variable
cycle times.
31 32
Types of Functional Units Types of Functional Units
" single-cycle (single latency) units: " multicycle units that are pipelined but do not accept a
– (simple) integer and (integer-based) multimedia units new operation each cycle (throughput of 1/2 or less):
– often the 64-bit floating-point operations in a
floating-point unit,
" multicycle units that are not pipelined:
" multicycle units that are pipelined (throughput of one):
– division unit, square root units, complex multimedia
– complex integer, floating-point, and
units
(floating-point-based) multimedia unit (also called
multimedia vector units) " variable cycle time units:
– load/store uniet (depending on cache misses) and
special implementations of e.g. Floating-point units.
33 34
Constraintsrelatedto
Constrains
IFPexecution
Constraintstopreserve Resourceconstraints
sequentialconsistency
ofexecution
Resource
dependences
Data-andcontrol Requirementof
dependences preciseexceptions