CS683 Project Presentation (Helloki)

CS-683 Project Checkpoint 1 Presentation
Orinoco: Ordered Issue and Unordered

Commit with Non-Collapsible Queues
Team Helloki
Atharva Kulkarni Avaneesh Sai Puja Kudupudi

210070047@iitb.ac.in 210070015@iitb.ac.in 210070046@iitb.ac.in
1
Challenges for Modern O3 Processors
● Moore's law has slowed down but still functional
● Microarchitectures becoming wider and deeper
● Full potential is not harnessed due to inefficient use of
resources
● Survey of Google and Facebook services:
○ Only 20-40% instructions retired without stalling
○ Actual bandwidth is 1/3rd of theoretical bandwidth
Challenges for Modern O3 Processors 2

Solutions?
● Need aggressive instruction scheduling
● O3 Commit comes in
● Scheduling: NP hard problem
● Being on the critical path of pipeline, it should be done
with minimum latency without complex hardware
Solutions? 3
Qs in O3
● Issue Q (IQ)
● Re-Order Buffer (ROB)
● Load/ Store Q (LSQ)
● They determine temporal order of instructions
Qs in O3 4
Existing O3 Commit Techniques
● Existing methods lack in capacity efficiency of
scheduling structures while preserving ideal instruction
ordering
Existing O3 Commit Techniques 5

Existing O3 Commit Techniques
● Prioritize certain instructions at commit
● But they struggle with reclaiming the gaps left in Qs
● Main Challenge: Physical ordering of instructions
within Qs
Existing O3 Commit Techniques 6

Orinoco
● Ordered issue and unordered commit
● Non collapsible IQ, ROB, LSQ
● Decouples temporal order from positions in Qs
Ordered
issue Unordered
commit
Orinoco 7
Age Matrices
● Track relative age in ROB and IQ
● # rows = # cols = # entries in respective Qs
● After decoding, renaming and dispatching to the Q,
valid (VLD) vector tracks entries
● Operation: When dispatched, set row vector and clear
column vector
Age Matrices 8
Age Matrix for IQ
● After waking up* in IQ, and ready to schedule, set BID
vector bit *Details in further sections
Age Matrix for IQ 9

Age Matrix for IQ
count = zeros(#instructions)
for instruction_in_ready_instructions:
temp = bitwise_AND(row_vec, BID)
count[instruction] = #ones in temp
● If ∃ nth oldest instruction, ∃ at most (n-1) older
instructions
● count <= n-1
● Thus, to select IW (Instruction Width) oldest
instructions, select the ones with count = IW - 1
Age Matrix for IQ 10

Age Matrix for ROB
● When speculation resolves, clear corresponding bit in
SPEC vector
for every_completed_instruction:
if bitwise_AND(row_vec,SPEC)==all_zeros:
commit
Age Matrix for ROB 11

Age Matrix for ROB
● Need to take care of exceptions/ misspeculation
● More importantly, oldest exception/ misspeculation

Age Matrix for ROB
result = zeros(#entries, #entries)
for every in_flight instruction:
result[instr] = bitwise_AND(row_vec, VLD)
one_hot = reduction_NOR_along_rows(temp)
● Only for the oldest instruction, result is all zeros
● After reduction NOR, we have one-hot vector pointing to
instruction which raised exception/ misspeculation

Commit Dependency Matrix
● Local state of an instruction:
○ Completed
○ Raised exception
○ Flushed by some other event
● Global state of an instruction:
○ Completed
○ Misspeculation
○ Exception
● Commit depends on local state of the instruction and
global state of older instructions
Commit Dependency Matrix 14
● Structure to identify the global state earlier in the
pipeline
● # rows = # cols = # in-flight instructions
● Intersections indicate commit dependencies

● On every new dispatch to ROB i.e. new row entry, set
bits in every column corresponding to older instructions
which may raise exceptions
● E.g.: Memory operations, branches and synchronization
barriers
● When an older instruction completes, clear its column
vector

● Before commit, every instruction waits for the commit
dependency to resolve i.e. th older ones to become
non-speculative
// check global state of completed instruction

if reduction_NOR(row_vec)==1:
commit if resources available

Precise Exception Handling
● Delinquent instruction = Instruction causing exception
● Delinquent and all subsequent instructions (w.r.t.
program order) are squashed
● The subsequent ones are located using column vector
of the delinquent instruction
● When none are commiting, determine delinquent
(unresolved or exception causing) using age matrix
Precise Exception Handling 18

Matrix Merging
● Global state is same for all instructions in ROB
● Thus, merge commit dependency matrix and age matrix
for ROB (saves space of order O(n2))
● A SPEC vector now tracks speculative instructions
// at commit time
for a completed_instruction:
result = bitwise_AND(row_vec, SPEC)
if reduction_NOR(result)==1:
grant to commit
Matrix Merging 19
Memory Disambiguation matrix
● The order and dependencies between loads and stores
need to be explicitly tracked
● This is done to decide the absence of conflicts as early
as possible
Memory Disambiguation Matrix 20

Memory Disambiguation Matrix
// when load enters LQ
row_vec = set_bit_at_store_addr_in_SQ
if store_addr != load_addr:
clear_col
if reduction_NOR(row_vec)==1:
Load_entry_becomes_non-speculative
● # rows = # loads in LQ, # columns = # stores in SQ
● A load in LQ sets its row according to older unresolved stores in SQ at
issue, and a store clears bits in its column when its address does not
conflict
● A load becomes non-speculative when all the bits in its row vector is
zero
Memory Disambiguation Matrix 21
Lockdown Matrix
● We use a lockdown matrix to track older non-performed
loads for each speculative load
Lockdown Matrix 22
Lockdown Matrix
//at commit
row_vec = set_bit_at_all_unperf_loads
if load_complete:
clear_col
if reduction_NOR(row_vec) == 1:
load_is_ordered
● # rows = # committed loads, # columns = # loads in LQ
● A committed load sets its row according to older unperformed
loads in LQ, and a performed load clears its column
● A lockdown to the address of a committed load is lifted when all
theMatrix
Lockdown bits in its row vector is zero
23
Wakeup Matrix
● Producers - Source operands
● An instruction sets its row according to its producers in
IQ at dispatch, and clears its column at issue
Wakeup Matrix 24
Wakeup Matrix
// at dispatch
row_vec = set_bit_at_all_instr_with_producers
if instr_issue:
clear_col
if reduction_NOR(row_vec) == 1:
instr_wake_up
● # rows = # columns = # Entries in IQ
● An instruction sets its row according to its producers in IQ at
dispatch, and clears its column at issue
● An instruction is woken up when all the bits in its row vector is zero
Lockdown Matrix 25
Matrix Scheduler Challenges
● Increased logic complexity
○ Fulfills O(1) time scheduling
● Timing and area constraints
● 12T SRAM arrays : 8 for dependency storage
● Eg: Age matrix performs n2 AND and n n-bit OR

operations per cycle
Matrix Scheduler Challenges 26

Processing-In-Memory Implementation
● Implements efficient matrix schedulers
● O(matrix scheduling) = O(element-wise logic operations)
● 8T SRAM arrays
● PIM supports vertical read and write mechanisms
Processing-In-Memory 27
Bit Line Computing
● Matrix scheduler stores dependency of instructions
● Precharge RBL
beforehand
● Activate RWL
● Bit count is encoded by
voltage drop on bit lines
Bit Line Computing 28

Vertical Access
● Standard SRAM arrays only support horizontal access
● Update Matrix
Schedulers
○ Column-wise write
● Memory Disambiguation
Matrix
○ Column-wise read
Vertical Access 29
Multibanking
● Multibanking for parallel processing
● Horizontally partitioned SRAM arrays

○ Matrix Schedulers into small arrays (banks)
Divided into n
single-ported banks
*n = dispatch width
Multibanking 30
The Complete Picture
The Complete Picture 31

References
● Orinoco: Ordered Issue and Unordered Commit with
Non-Collapsible Queues Dibei Chen, Tairan Zhang, et.
al
● Prof. Biswa's slides
References 32
Questions?
Questions? 33

CS683 Project Presentation (Helloki)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS683 Project Presentation (Helloki)

Uploaded by

Copyright:

Available Formats

CS-683 Project Checkpoint 1 Presentation

Orinoco: Ordered Issue and Unordered

Atharva Kulkarni Avaneesh Sai Puja Kudupudi

Challenges for Modern O3 Processors 2

Existing O3 Commit Techniques 5

Existing O3 Commit Techniques 6

Age Matrix for IQ 9

Age Matrix for IQ 10

Age Matrix for ROB 11

Age Matrix for ROB 12

Age Matrix for ROB 13

Commit Dependency Matrix 15

Commit Dependency Matrix 16

// check global state of completed instruction

Commit Dependency Matrix 17

Precise Exception Handling 18

Memory Disambiguation Matrix 20

● Timing and area constraints

● 12T SRAM arrays : 8 for dependency storage

● Eg: Age matrix performs n2 AND and n n-bit OR

Matrix Scheduler Challenges 26

● O(matrix scheduling) = O(element-wise logic operations)

● PIM supports vertical read and write mechanisms

Bit Line Computing 28

● Horizontally partitioned SRAM arrays

The Complete Picture 31

You might also like