Professional Documents
Culture Documents
1
Challenges for Modern O3 Processors
● Moore's law has slowed down but still functional
● Microarchitectures becoming wider and deeper
● Full potential is not harnessed due to inefficient use of
resources
● Survey of Google and Facebook services:
○ Only 20-40% instructions retired without stalling
○ Actual bandwidth is 1/3rd of theoretical bandwidth
Solutions? 3
Qs in O3
● Issue Q (IQ)
● Re-Order Buffer (ROB)
● Load/ Store Q (LSQ)
● They determine temporal order of instructions
Qs in O3 4
Existing O3 Commit Techniques
● Existing methods lack in capacity efficiency of
scheduling structures while preserving ideal instruction
ordering
Ordered
issue Unordered
commit
Orinoco 7
Age Matrices
● Track relative age in ROB and IQ
● # rows = # cols = # entries in respective Qs
● After decoding, renaming and dispatching to the Q,
valid (VLD) vector tracks entries
● Operation: When dispatched, set row vector and clear
column vector
Age Matrices 8
Age Matrix for IQ
● After waking up* in IQ, and ready to schedule, set BID
vector bit *Details in further sections
Lockdown Matrix 22
Lockdown Matrix
//at commit
row_vec = set_bit_at_all_unperf_loads
if load_complete:
clear_col
if reduction_NOR(row_vec) == 1:
load_is_ordered
● # rows = # committed loads, # columns = # loads in LQ
● A committed load sets its row according to older unperformed
loads in LQ, and a performed load clears its column
● A lockdown to the address of a committed load is lifted when all
theMatrix
Lockdown bits in its row vector is zero
23
Wakeup Matrix
● Producers - Source operands
● An instruction sets its row according to its producers in
IQ at dispatch, and clears its column at issue
Wakeup Matrix 24
Wakeup Matrix
// at dispatch
row_vec = set_bit_at_all_instr_with_producers
if instr_issue:
clear_col
if reduction_NOR(row_vec) == 1:
instr_wake_up
● # rows = # columns = # Entries in IQ
● An instruction sets its row according to its producers in IQ at
dispatch, and clears its column at issue
● An instruction is woken up when all the bits in its row vector is zero
Lockdown Matrix 25
Matrix Scheduler Challenges
● Increased logic complexity
○ Fulfills O(1) time scheduling
● 8T SRAM arrays
Processing-In-Memory 27
Bit Line Computing
● Matrix scheduler stores dependency of instructions
● Precharge RBL
beforehand
● Activate RWL
● Bit count is encoded by
voltage drop on bit lines
● Update Matrix
Schedulers
○ Column-wise write
● Memory Disambiguation
Matrix
○ Column-wise read
Vertical Access 29
Multibanking
● Multibanking for parallel processing
Divided into n
single-ported banks
*n = dispatch width
Multibanking 30
The Complete Picture
References 32
Questions?
Questions? 33