Professional Documents
Culture Documents
Bei Yu
Department of Computer Science & Engineering
Chinese University of Hong Kong
byu@cse.cuhk.edu.hk
System Specification
Architectural Design
module test
input in[3];
…
endmodule Functional Design and
Logic Design (RTL)
AND
OR
Logic Synthesis
Physical Design
Fabrication
Chip
2/50
Placement in Design Flow
System Specification
Architectural Design
module test
input in[3];
…
endmodule Functional Design and Floorpanning
Logic Design (RTL)
AND
OR
Placement
Logic Synthesis
Fabrication
Timing Closure
Chip
2/50
EDA Toy Example [Jens Vygen,2006]
3/50
EDA Toy Example [Jens Vygen,2006]
3/50
EDA Toy Example [Jens Vygen,2006]
3/50
Moore’s Law to Extreme Scaling
1013
CPU
1011 GPU
RAM
Transistor Count FPGA
109 Flash
107
105
103
101
1960 1970 1980 1990 2000 2010 2020
Year
4/50
An Inverter Example
5/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
Challenge: Larger and Larger Scale
Placement [Lu+,DAC’14]: 221K nets, 63 fixed macros and 210K movable cells.
6/50
How Difficult Placement is
Google AlphaGo
Train 40 days using 176 GPUs
7/50
Good Placement vs. Bad Placement
WL = 5.47e+4 WL = 6.73e+3
8/50
Typical Placement Flow
WL: 1.00e+6
1. Global placement
9/50
Typical Placement Flow
9/50
Typical Placement Flow
9/50
Global Placement Algorithms
The History of Placement Algorithms
<1970-1980s 1980s-1990s
Simulated
Partitioning
Annealing
Timberwolf
Breuer
VPR
Dunlop &
Dragon
Kernighan
Quadratic
Assignment
Cadence
QPlace
11/50
The History of Placement Algorithms
Timberwolf ePlace
Breuer FengShui GORDIAN APlace POLAR
VPR RePlAce
Dunlop & Naylor SimPL
Dragon Capo BonnPlace DREAMPlace
Kernighan Synopsis ComPLx
Quadratic Capo
mFar NTUplace MAPLE
Assignment +Rooster
Cadence
Kraftwerk mPL6
QPlace
FastPlace
Low quality Low
efficiency
Warp3
11/50
Recent Development of Placement
ePlace
RePlAce DREAMPlace
BonnPlace
MAPLE
mPL6 POLAR
RQL ComPLx
FastPlace3
NTUplace3
APlace3
Capo 10.5
12/50
Quadratic Placement – Optimizing Wirelength Objective
Manhattan-distance model
13/50
Quadratic Placement – Optimizing Wirelength Objective
# #
! 𝑥! − 𝑥" + ! 𝑦! − 𝑦"
4-pin net Optimal: min. Steiner tree Clique model Star model
Euclidean-distance model
13/50
Quadratic Placement – Optimizing Wirelength Objective
Pessimistic Optimistic
Model Min. Steiner Clique Model HPWL Star Model
Tree
Distance Manhattan Manhattan Euclidean Manhattan Euclidean
Accuracy ★★★★★ ★ ★ ★★★★ ★★★
Smoothness ★ ★★ ★★★★ ★★ ★★★★★
★
Manhattan
0.75
Euclidean
0.5
0.25
-0.25
13/50
Quadratic Placement – Optimizing Wirelength Objective
14/50
Quadratic Placement – Optimizing Wirelength Objective
14/50
Quadratic Placement – Satisfying Constraints
j connected to 6
𝑊𝐿 %&'"!
7 4 5
7 4
3 3 1
1 6 2
8
2
8 Pseudo-net 𝑛𝑒𝑡"!
6
Anchor 𝑎!
15/50
Quadratic Placement – Overall Optimization Flow
1st iteration
16/50
Quadratic Placement – Overall Optimization Flow
16/50
Quadratic Placement – Summary
• Iterative optimization
• Wirelength models: HPWL, clique model, star model
• QP solver is fast!
• Rough legalization
OPT
Satisfying
Constraints
OPT
Satisfying
Constraints
…
17/50
Nonlinear Placement
• Mathematical formulation
• 𝑑! denotes the density of bin 𝑖
min 𝑊𝐿(𝒙, 𝒚) ,
𝒙,𝒚
𝑠. 𝑡. 𝑑$ 𝒙, 𝒚 ≤ 𝑡% , ∀𝑏 ∈ 𝐵𝑖𝑛𝑠
Wirelength Density
Bins
18/50
Nonlinear Placement – Wirelength Smoothing1
• 𝑊𝐿 𝒙, 𝒚 = ∑!∈# 𝑊𝐿! 𝒙, 𝒚
• 𝐻𝑃𝑊𝐿 = 𝑚𝑎𝑥 𝑥$ − 𝑥% + 𝑚𝑎𝑥 |𝑦$ − 𝑦% |
• Equivalently max 𝑥! − min 𝑥! + max 𝑦! − min 𝑦!
! ! ! !
• Log-sum-exp (LSE)
!"
• 𝐿𝑆𝐸 𝒙; 𝛾 = 𝛾 ln ∑! 𝑒 #
• max 𝑥", ⋯ , 𝑥# < 𝐿𝑆𝐸 𝒙; 𝛾 ≤ max 𝑥", ⋯ , 𝑥# + 𝛾ln(𝑛)
• 𝐿𝑆𝐸 𝒙; 𝛾 ≈ max 𝑥", ⋯ , 𝑥#
• −𝐿𝑆𝐸 𝒙; −𝛾 ≈ min 𝑥", ⋯ , 𝑥#
!" ! $" $
' #" ' #"
• 𝑊𝐿$ 𝒙, 𝒚; 𝛾 = 𝛾(ln ∑%" ∈$ 𝑒 # + ln ∑%" ∈$ 𝑒 + ln ∑%" ∈$ 𝑒 # + ln ∑%" ∈$ 𝑒 )
x y
1
William C. Naylor, Ross Donelly, and Lu Sha (2001). Non-linear optimization system and method for
wire length and delay optimization for an automatic electric circuit placer. US Patent 216,632. 19/50
Nonlinear Placement – Wirelength Smoothing2
𝐿𝑆𝐸(); 5)
7.5 7.5
5
𝐿𝑆𝐸(); 1) HPWL 5
HPWL
2.5
𝑊𝐴(); 1) 2.5
𝑊𝐴(); 5)
-10 -7.5 -5 -2.5 0 2.5 5 7.5 10 -10 -7.5 -5 -2.5 0 2.5 5 7.5 10
2
Meng-Kai Hsu, Valeriy Balabanov, and Yao-Wen Chang (2013). “TSV-aware analytical
placement for 3-D IC designs based on a novel weighted-average wirelength model”. In: IEEE
TCAD 32.4, pp. 497–509. 20/50
3
3
Fan-Keng Sun and Yao-Wen Chang (2019). “BiG: A bivariate gradient-based wirelength model
for analytical circuit placement”. In: Proc. DAC. 21/50
Nonlinear Placement – Density Penalty4 ,5
4
Andrew B Kahng and Qinke Wang (2005). “Implementation and extensibility of an analytic
placer”. In: IEEE TCAD 24.5, pp. 734–747.
5
Sheng Chou, Meng-Kai Hsu, and Yao-Wen Chang (2012). “Structure-aware placement for
datapath-intensive circuit designs”. In: Proc. DAC, pp. 762–767. 22/50
Nonlinear Placement – Density Penalty
#
*! 𝒙, 𝒚 − 𝑡"
𝜆( 𝐷
!
• Challenges
• Gradient only has local view
• Need multi-level bins
Multi-level bins
23/50
Nonlinear Placement – Density Penalty
23/50
Detailed Placement Algorithms
NP-Hardness
WL: 1.02e+6
3. Detailed Placement
• Multi-bin Knapsack?
• TSP?
25/50
Single-Row Problem
fixed cells
movable cells C1 C2 C3 C4 C5 C6 C7
fixed cells
Single-Row Placement6
6
Jens Vygen (1998). “Algorithms for detailed placement of standard cells”. In: Proc. DATE,
pp. 321–324. 27/50
Single-Row Placement7 ,8
7
Andrew B Kahng, Paul Tucker, and Alex Zelikovsky (1999). “Optimization of linear placements
for wirelength minimization with free sites”. In: Proc. ASPDAC, pp. 241–244.
8
Andrew B. Kahng, Igor L. Markov, and Sherief Reda (2004). “On legalization of row-based
placements”. In: Proc. GLSVLSI, pp. 214–219. 28/50
Single-Row Placement9
9
Min Pan, Natarajan Viswanathan, and Chris Chu (2005). “An efficient and effective detailed
placement algorithm”. In: Proc. ICCAD, pp. 48–55. 29/50
Can we consider Cell Ordering?
• Sung Woo Hur and John Lillis (2000). “Mongrel: hybrid techniques for standard cell
placement”. In: Proc. ICCAD, pp. 165–170
• Yuelin Du and Martin D. F. Wong (2014). “Optimization of standard cell based
detailed placement for 16 nm FinFET process”. In: Proc. DATE, 357:1–357:6
30/50
DREAMPlace
Typical Nonlinear Placement Algorithm
Limited acceleration
Limited speedup, e.g. mPL, due to clustering
32/50
Advances in Deep Learning Hardware/Software
DREAMPlace Strategies
• Cast the non-linear placement problem into a neural network training problem.
• Leverage deep learning hardware (GPU) and software toolkit (e.g. Pytorch)
• Enable ultra-high parallelism and acceleration while getting the state-of-the-art
results.
10
Yibo Lin et al. (2019). “DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for
Modern VLSI Placement”. In: Proc. DAC. 34/50
Analogy between NN Training and Placement [DAC’19]11
n
X n
X
min f (φ(xi ; w), yi ) + λR(w) min WL(φ(xi ; w), yi ) + λD(w)
w w
i i
Forward Propagation
Compute obj
Backward Propagation
∂obj
Compute gradient ∂w
11
Yibo Lin et al. (2019). “DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for
35/50
Analogy between NN Training and Placement [DAC’19]11
Forward Propagation
Compute obj
Placement
API
Match RePlAce
Nonlinear Nesterov’s
Python
Optimizer Method
[TCAD’19,Cheng+]
Automatic
Gradient
OPs CONV WL
C++/CUDA ReLU Density
12
12
C. Cheng et al. (2019). “RePlAce: Advancing Solution Quality and Routability Validation in
Global Placement”. In: IEEE TCAD 38.9, pp. 1717–1730. 36/50
Experimental Results
DREAMPlace
• CPU: Intel E5-2698 v4 @2.20GHz
• GPU: 1 NVIDIA Tesla V100
• Single CPU thread was used 34x
speedup
RePlAce [TCAD’18,Cheng+]
• CPU: 24-core 3.0 GHz Intel Xeon
• 64GB memory allocated ISPD 2005 Benchmarks
200K~2M cells
37/50
Future Directions
DREAM
Gate sizing, BIGGER Multi-GPU,
distributed computing,
floorplanning,
mixed precision,
…
…
Applicable to Other
New Accelerations
CAD Problems
38/50
Ex: Routability Estimation × DREAMPlace [DATE’21]13
H H H Network Network
H 4 4 4 H H Backward
2 2 2 Forward
H 64 32 32 H H
Conv
Conv Conv Gradient w.r.t. congestion map
32 (Pool) 16 16 Congestion Map
Conv ∇fR (M) L
Trans Conv fR (M) ∈ RM ×N
(Pool)
3 4 1
Input Trans Output Congestion Penalty L = 1 2
M N kfR (M)k2
13
Siting Liu et al. (2021). “Global Placement with Deep Learning-Enabled Explicit Routability
Optimization”. In: Proc. DATE. 39/50
Datapath
Huawei Ascend 910
4
Level
0
63 60 57 54 51 48 45 42 39 36 33 30 27 24 21 18 15 12 9 6 3 0
42/50
Datapath Driven Standard Cell Placement
43/50
Datapath Driven Placement: Abstract Physical Model
44/50
Regularity Extraction
• Consider cells with the same bit-slice are lined up horizontally [Nijssen IWLS’96]
• geometric regularity: circuit is fitted onto a matrix of rectangular buckets
• interconnect regularity: most nets are within one slice/one stage
• Typical methods for regularity extraction
• Search-wave expansion [Nijssen IWLS’96]
• Signature-based [Arikati, ICCAD’97]
• Template covering [Chowdhary, TCAD’99]
• Network flow [Xiang, ISPD’13]
• Bipartite graph vertex covering [Huang, DAC’17]
45/50
Regularity Extraction: Network Flow Approach
DMF identification can be transformed to a network flow problem [Xiang, ISPD’13] 46/50
Datapath Driven Systolic Array Placement
• ···
48/50
These slides contain/adapt materials developed by
49/50
Draw Placement Together