You are on page 1of 33

CS-683 Project Checkpoint 1 Presentation

Orinoco: Ordered Issue and Unordered


Commit with Non-Collapsible Queues
Team Helloki

Atharva Kulkarni Avaneesh Sai Puja Kudupudi


210070047@iitb.ac.in 210070015@iitb.ac.in 210070046@iitb.ac.in

1
Challenges for Modern O3 Processors
● Moore's law has slowed down but still functional
● Microarchitectures becoming wider and deeper
● Full potential is not harnessed due to inefficient use of
resources
● Survey of Google and Facebook services:
○ Only 20-40% instructions retired without stalling
○ Actual bandwidth is 1/3rd of theoretical bandwidth

Challenges for Modern O3 Processors 2


Solutions?
● Need aggressive instruction scheduling
● O3 Commit comes in
● Scheduling: NP hard problem
● Being on the critical path of pipeline, it should be done
with minimum latency without complex hardware

Solutions? 3
Qs in O3
● Issue Q (IQ)
● Re-Order Buffer (ROB)
● Load/ Store Q (LSQ)
● They determine temporal order of instructions

Qs in O3 4
Existing O3 Commit Techniques
● Existing methods lack in capacity efficiency of
scheduling structures while preserving ideal instruction
ordering

Existing O3 Commit Techniques 5


Existing O3 Commit Techniques
● Prioritize certain instructions at commit
● But they struggle with reclaiming the gaps left in Qs
● Main Challenge: Physical ordering of instructions
within Qs

Existing O3 Commit Techniques 6


Orinoco
● Ordered issue and unordered commit
● Non collapsible IQ, ROB, LSQ
● Decouples temporal order from positions in Qs

Ordered
issue Unordered
commit

Orinoco 7
Age Matrices
● Track relative age in ROB and IQ
● # rows = # cols = # entries in respective Qs
● After decoding, renaming and dispatching to the Q,
valid (VLD) vector tracks entries
● Operation: When dispatched, set row vector and clear
column vector

Age Matrices 8
Age Matrix for IQ
● After waking up* in IQ, and ready to schedule, set BID
vector bit *Details in further sections

Age Matrix for IQ 9


Age Matrix for IQ
count = zeros(#instructions)
for instruction_in_ready_instructions:
temp = bitwise_AND(row_vec, BID)
count[instruction] = #ones in temp
● If ∃ nth oldest instruction, ∃ at most (n-1) older
instructions
● count <= n-1
● Thus, to select IW (Instruction Width) oldest
instructions, select the ones with count = IW - 1

Age Matrix for IQ 10


Age Matrix for ROB
● When speculation resolves, clear corresponding bit in
SPEC vector
for every_completed_instruction:
if bitwise_AND(row_vec,SPEC)==all_zeros:
commit

Age Matrix for ROB 11


Age Matrix for ROB
● Need to take care of exceptions/ misspeculation
● More importantly, oldest exception/ misspeculation

Age Matrix for ROB 12


Age Matrix for ROB
result = zeros(#entries, #entries)
for every in_flight instruction:
result[instr] = bitwise_AND(row_vec, VLD)
one_hot = reduction_NOR_along_rows(temp)
● Only for the oldest instruction, result is all zeros
● After reduction NOR, we have one-hot vector pointing to
instruction which raised exception/ misspeculation

Age Matrix for ROB 13


Commit Dependency Matrix
● Local state of an instruction:
○ Completed
○ Raised exception
○ Flushed by some other event
● Global state of an instruction:
○ Completed
○ Misspeculation
○ Exception
● Commit depends on local state of the instruction and
global state of older instructions
Commit Dependency Matrix 14
Commit Dependency Matrix
● Structure to identify the global state earlier in the
pipeline
● # rows = # cols = # in-flight instructions
● Intersections indicate commit dependencies

Commit Dependency Matrix 15


Commit Dependency Matrix
● On every new dispatch to ROB i.e. new row entry, set
bits in every column corresponding to older instructions
which may raise exceptions
● E.g.: Memory operations, branches and synchronization
barriers
● When an older instruction completes, clear its column
vector

Commit Dependency Matrix 16


Commit Dependency Matrix
● Before commit, every instruction waits for the commit
dependency to resolve i.e. th older ones to become
non-speculative

// check global state of completed instruction


if reduction_NOR(row_vec)==1:
commit if resources available

Commit Dependency Matrix 17


Precise Exception Handling
● Delinquent instruction = Instruction causing exception
● Delinquent and all subsequent instructions (w.r.t.
program order) are squashed
● The subsequent ones are located using column vector
of the delinquent instruction
● When none are commiting, determine delinquent
(unresolved or exception causing) using age matrix

Precise Exception Handling 18


Matrix Merging
● Global state is same for all instructions in ROB
● Thus, merge commit dependency matrix and age matrix
for ROB (saves space of order O(n2))
● A SPEC vector now tracks speculative instructions
// at commit time
for a completed_instruction:
result = bitwise_AND(row_vec, SPEC)
if reduction_NOR(result)==1:
grant to commit
Matrix Merging 19
Memory Disambiguation matrix
● The order and dependencies between loads and stores
need to be explicitly tracked
● This is done to decide the absence of conflicts as early
as possible

Memory Disambiguation Matrix 20


Memory Disambiguation Matrix
// when load enters LQ
row_vec = set_bit_at_store_addr_in_SQ
if store_addr != load_addr:
clear_col
if reduction_NOR(row_vec)==1:
Load_entry_becomes_non-speculative
● # rows = # loads in LQ, # columns = # stores in SQ
● A load in LQ sets its row according to older unresolved stores in SQ at
issue, and a store clears bits in its column when its address does not
conflict
● A load becomes non-speculative when all the bits in its row vector is
zero
Memory Disambiguation Matrix 21
Lockdown Matrix
● We use a lockdown matrix to track older non-performed
loads for each speculative load

Lockdown Matrix 22
Lockdown Matrix
//at commit
row_vec = set_bit_at_all_unperf_loads
if load_complete:
clear_col
if reduction_NOR(row_vec) == 1:
load_is_ordered
● # rows = # committed loads, # columns = # loads in LQ
● A committed load sets its row according to older unperformed
loads in LQ, and a performed load clears its column
● A lockdown to the address of a committed load is lifted when all
theMatrix
Lockdown bits in its row vector is zero
23
Wakeup Matrix
● Producers - Source operands
● An instruction sets its row according to its producers in
IQ at dispatch, and clears its column at issue

Wakeup Matrix 24
Wakeup Matrix
// at dispatch
row_vec = set_bit_at_all_instr_with_producers
if instr_issue:
clear_col
if reduction_NOR(row_vec) == 1:
instr_wake_up
● # rows = # columns = # Entries in IQ
● An instruction sets its row according to its producers in IQ at
dispatch, and clears its column at issue
● An instruction is woken up when all the bits in its row vector is zero

Lockdown Matrix 25
Matrix Scheduler Challenges
● Increased logic complexity
○ Fulfills O(1) time scheduling

● Timing and area constraints

● 12T SRAM arrays : 8 for dependency storage

● Eg: Age matrix performs n2 AND and n n-bit OR


operations per cycle

Matrix Scheduler Challenges 26


Processing-In-Memory Implementation
● Implements efficient matrix schedulers

● O(matrix scheduling) = O(element-wise logic operations)

● 8T SRAM arrays

● PIM supports vertical read and write mechanisms

Processing-In-Memory 27
Bit Line Computing
● Matrix scheduler stores dependency of instructions

● Precharge RBL
beforehand
● Activate RWL
● Bit count is encoded by
voltage drop on bit lines

Bit Line Computing 28


Vertical Access
● Standard SRAM arrays only support horizontal access

● Update Matrix
Schedulers
○ Column-wise write
● Memory Disambiguation
Matrix
○ Column-wise read

Vertical Access 29
Multibanking
● Multibanking for parallel processing

● Horizontally partitioned SRAM arrays


○ Matrix Schedulers into small arrays (banks)

Divided into n
single-ported banks

*n = dispatch width

Multibanking 30
The Complete Picture

The Complete Picture 31


References
● Orinoco: Ordered Issue and Unordered Commit with
Non-Collapsible Queues Dibei Chen, Tairan Zhang, et.
al
● Prof. Biswa's slides

References 32
Questions?

Questions? 33

You might also like