Professional Documents
Culture Documents
PDA 수업자료
PDA 수업자료
1IZTJDBM%FTJHOᖅĥ ᯱ࠺⪵
Łญ᷹ ᯦ྙ
ᯥᖒȽ Ʊᙹ
᳑ḡᦥ Ŗ
ݡᱥᯱ⍕⥉░ ⦺ᇡ
MJNTL!FDFHBUFDIFEV
*%&$ݡᱥ
֥ ᬵ ˀ ᯝ
v᮹ ĥ⫮ 2
֥ ᬵ ᯝ ᬵ
ˀ ᯝ ɩ
v᮹ e
_ e
v᮹ ĥ⫮ ĥᗮ
3
v᮹ ĥ⫮ ĥᗮ
4
v᮹ ĥ⫮ ĥᗮ
5
ᵝᱽ1BSUJUJPOJOH
ᯝ₉ 1IZTJDBM%FTJHOᗭ}
,FSOJHIBO-JOŁญ᷹
'JEVDDJB.BUUIFZTFT Łญ᷹
ᵝᱽ'MPPSQMBOOJOH
ᯝ₉ 1PMJTI&YQSFTTJPOŁญ᷹
4FRVFODF1BJSŁญ᷹
ᵝᱽ1MBDFNFOU
ᯝ₉ .JODVU 1MBDFNFOUŁญ᷹
(PSEJBO1MBDFNFOUŁญ᷹
v᮹ ĥ⫮ ĥᗮ
6
ᵝᱽ(MPCBM3PVUJOH
ᯝ₉ 4UFJOFS3PVUJOHŁญ᷹
#PVOEFE3BEJVT3PVUJOHŁญ᷹
ᵝᱽ%FUBJMFE3PVUJOH
ᯝ₉ $IBOOFM3PVUJOHŁญ᷹
4XJUDICPY3PVUJOHŁญ᷹
Ŗḡ ᔍ⧎ 7
ˍ v᮹ ᯱഭ ˍ v᮹ ᵲ ḩྙ
ˀ 1%'ᯱഭ Ŗᮁ⧉ ˀ ;PPN ₥❦ ᯕᬊ
ˍ ࠺ᩢᔢ ⪵ך
ˀ ɩḡ⧉ᮥ ᧲⧕ ᇡ┢ऽพ݅ܩ
ˍ ᙹᨦ ႊ
ˀ ๅ e ᇥ ᙹᨦ ᇥ ⮕
ˍ vᔍ᮹ ᵭ ᱲᗮ ྙᱽ
ˀ ᯱญෝ ᯕ┩⦹ḡ ัŁ ݡʑ
Sung Kyu Lim, Georgia Tech (2022)
vᔍ ᗭ} 8
ˍ ᯥᖒȽ
ˀ 6$-"
ˀ ᩑǍᇥ"*₉ᬱ ၹࠥℕ ⫭ಽ
ᖅĥ ၰ ᖅĥ ⚕ }ၽ
ˀ *%&$v᮹ ⩥ᰍ
ˀ ֥ *%&$ᬑᙹvᔍᔢ ᙹᔢ
1IZTJDBM%FTJHO"VUPNBUJPO 9
ᱽ ₦
10
ᱽ ᙹᨦ 11
ˍ ֥ ᇡ░
– http://limsk.ece.gatech.edu/course/ece6133/
ᱽ ᙹᨦ 12
ˍ -BSHFTU QMBDFSPVUFDPVSTFJOUIFXPSME
ˀ 0GGFSFEFWFSZZFBSBUHSBEVBUFMFWFM
ˀ $PWFSTQIZTJDBMEFTJHOPOMZ
ᅥ⦺ʑ
v᮹ ⠪a
3FGFSFODFT 13
ˍ 1SBDUJDBM1SPCMFNTJO7-4*1IZTJDBM%FTJHO
"VUPNBUJPO
ˀ 4VOH,ZV-JN
ˍ 7-4*1IZTJDBM%FTJHO'SPN(SBQI1BSUJUJPOJOHUP
5JNJOH$MPTVSF
ˀ "OESFX#,BIOH
+FOT-JFOJH
*HPS-.BSLPW
+JO)V
Introduction
ECE6133
Physical Design Automation of VLSI Systems
Partitioning
Architectural Design
ENTITY test is
port a: in bit;
end ENTITY test;
Functional Design Chip Planning
and Logic Design
Physical Design
Clock Tree Synthesis
Physical Verification
DRC and Signoff
LVS Signal Routing
ERC
Fabrication
Timing Closure
Chip
Vdd Contact
Metal layer
Vdd IN2 Poly layer
IN2
IN1 OUT Diffusion layer
OUT
IN1 p-type
transistor
n-type
GND
transistor
GND
IN1
OUT
IN2 Power (Vdd)-Rail
Ground (GND)-Rail
9
M1 M2 M3
M4 M5 M6
10
M1 M2 M3
M4 M5 M6
12
M7
M1 M2 M3
M4 M5 M6
M7 M8 M9
M10
14
Placement Comparison
• Runtime: 1 sec vs 44 sec vs 739 sec
Routing Comparison
• Runtime: 12 sec vs 289 sec vs 4740 sec
Partitioning
ECE6133
Physical Design Automation of VLSI Systems
Partitioning
Partitioning
System design
Partitioning of A Circuit
Input size = 48
(a)
~
(b)
Cut 1 = 4 Cut 2 = 4
Size 1 = 15 Size 2 = 16 Size 3 = 17
Partitioning
System
Board
System
Level
Partitioning Methods
• Top-down Partitioning (cutsize only)
– Iterative improvement [KL70, FM82, Kr84, San89]
– Spectral based [HK92, AZ95]
– Clustering method [SU72, NOP87, WC92, SS93, CS93, HK95]
– Network flow based [YW94, YW97]
– Analytical based [RDJ94, LLC95]
– Multi-level [CS93, HB95, AHK97, KA+97, KK99]
• Bottom-up Clustering (delay only)
– Unit delay model [LLT69, CD93]
– General delay model [MBV91, RW93, YW95]
– Sequential circuits with retiming [PKL98, CLW99, CL00]
Partitioning
Kernighan-Lin Algorithm
It is a bisectioning algorithm
The input graph is partitioned into two subsets of equal sizes.
Till the cutsize keeps improving,
Vertex pairs which give the largest decrease in cutsize
~
are exchanged
These vertices are then locked
If no improvement is possible and some vertices are still
unlocked, the vertices which give the smallest
increase are exchanged
~ W. Kernighan and S. Lin, Bell System Technical Journal, 1970.
Kernighan-Lin Algorithm
Algorithm KL
begin
INITIALIZE
while IMPROVEtable = TRUE do
* if an improvement has been made during last iteration,
the process is carried out again. *
while UNLOCKA = TRUE do
* if there exists any unlocked vertex in A,
more tentative exchanges are carried out. *
for each a 2 A do
if a = unlocked then
for each b 2 B do
~
if b = unlocked then
if Dmax Da + Db then
Dmax = Da + Db
amax = a
bmax = b
TENT-EXCHGEamax bmax
LOCKamax bmax
LOGtable
Dmax = 1
ACTUAL-EXCHGEtable
end.
Kernighan-Lin Algorithm
Perform single KL pass on the following circuit:
KL needs undirected graph (clique-based weighting)
First Swap
Second Swap
Third Swap
Fourth Swap
Last swap does not require gain computation
Summary
Cutsize reduced from 5 to 3
Two best solutions found (solutions are always area-balanced)
Partitioning
Fiduccia-Mattheyses Algorithm
Partitioning
Vertex # Vertex #
-pmax
Vertex
......... n List of free
1 2 vertices
+pmax IInd Partition
Vertex # Vertex #
-pmax
Vertex
1 2 ......... n
Fiduccia-Mattheyses Algorithm
Perform FM algorithm on the following circuit:
Area constraint = [3,5]
Break ties in alphabetical order.
Initial Partitioning
Random initial partitioning is given.
First Move
Second Move
Third Move
Forth Move
Fifth Move
Sixth Move
Seventh Move
Last Move
Summary
Found three best solutions.
Cutsize reduced from 6 to 3.
Solutions after move 2 and 4 are better balanced.
Floorplanning
ECE6133
Physical Design Automation of VLSI Systems
Floorplanning
7 5 5
7 3
4
6 6 4
2
1 3 1 2
An optimal floorplan,
in terms of area A non−optimal floorplan
31
Floorplan Design
x
Modules: y
Area: A=xy
g e Aspect ratio: r <= y/x <= s
d
Rotation:
f
Module connectivity
b
c a 2 b
a
3 1 3 6
c 5 d
5
2
e f
Floorplanning: Terminology
• Rectangular dissection: Subdivision of a given rectangle by a finite #
of horizontal and vertical line segments into a finite # of non-overlapping
rectangles.
• Slicing structure: a rectangular dissection that can be obtained by
repetitively subdividing rectangles horizontally or vertically.
• Slicing tree: A binary tree, where each internal node represents a vertical
cut line or horizontal cut line, and each leaf a basic rectangle.
• Skewed slicing tree: One in which no node and its right child are the
same.
V V
3 3 H H H H
1
1 5
4 4 5 2 1 H 3 2 1 V H
2 V V 6 7 V 3
2 6 7 6 7
6 7 4 5 4 5
Non−slicing floorplan Slicing floorplan A slicing tree (skewed) Another slicing tree
(non−skewed)
32
Solution Representation
• An expression E = e1 e2 . . . e2n−1 , where ei ∈ {1, 2, . . . , n, H, V }, 1 ≤ i ≤
2n − 1, is a Polish expression of length 2n − 1 iff
1. every operand j, 1 ≤ j ≤ n, appears exactly once in E;
2. (the balloting property) for every subexpression Ei = e1 . . . ei, 1 ≤
i ≤ 2n − 1, #operands > #operators.
1 6 H 3 5 V 2 H V 7 4 H
# of operands = 4 ....... = 7
# of operators = 2 ....... = 5
• Polish expression ←→ Postorder traversal.
• ijH: rectangle i on bottom of j; ijV : rectangle i on the left of j.
V
7 5 H H
4 V V
3 4
H
6 2 7 5
2 1 6
1 3 E = 16H2V75VH34HV
E = 16+2*75*+34+*
Postorder traversal of a tree!
1 V V 4
3 H 4 1 H
1 4 2 3 2 3
2 E = 123H4VV E = 123HV4V
non−skewed! skewed!
H V
Non−skewed H V
cases
7 5 H H
4 V V 3 4
6 H 2 7 5
2 1 6
1 3 E = 16H2V75VH34HV
A normalized Polish expression
Area Computation
{ (5,5) (9,4) } 2 1 2
V
V V { (3,2) }
1 2
{ (2,3) (3,2) } { (2,2) } 5 6
3 1 3 4
3 4 { (1,2) (2,1) } { (2,2) }
{ (1,3) (3,1) } { (2,3) (3,2) }
u2 max{u1, u2} V
u1
v w v+w H H
V V
u2 1 2
u2 u1+u2 5 6
u1 u1 3 4
v w
max{v, w}
• Wiring cost?
34
• Related work
– Wong & Liu, “Floorplan design for rectangular and L-shaped mod-
ules,” ICCAD’87.
– Wong, Leong, Liu, Simulated Annealing for VLSI Design, pp. 31–71,
Kluwer academic Publishers, 1988.
• Ingredients: solution space, neighborhood structure, cost function, an-
nealing schedule?
Annealing (Wikipedia)
• Annealing (metallurgy)
– a heat treatment that alters the microstructure of a material, causing
changes in properties such as strength, hardness, and ductility
• Simulated annealing
– a numerical optimization technique for searching for a solution in a space
otherwise too large for ordinary search methods to yield results
35
– 30 blocks
Neighborhood Structure
• Chain: HV HV H . . . or V HV HV . . .
1 6 H 3 5 V 2 H V 7 4 H V
chain
Effects of Perturbation
3
4
2 2
3 3 4
4 4
1 2 1 2 M2 1 1
M1 M3 3
12V4H3V 12V3H4V 12H3H4V 12H34HV
Cost Function
• Φ = A + λW .
– A: area of the smallest rectangle
– λ: user-specified parameter
3
4
2 2
3 3 4
4 4
1 2 1 2 M2 1 1
M1 M3 3
A: 12H34HV
• W = ij cij dij .
H H H H
V V
M1 V V
1 2 1 2
6
5 4 6
3
4 3 5
E = 12H34V56VHV E = 12H35V46VHV
H H H V
V V
M2 V H
1 2 1 2
5 6
5 6
3 4
3 4
E = 12H34V56VHV E = 12H34V56HVH
V V
H H 1 H
V V
M3 V V
1 2
5 6
5 6
3 4
H 4
2
E = 12H34V56VHV 3
E = 123H4V56VHV
40
Annealing Schedule
• Ti = r iT0 , i = 1, 2, 3, . . .; r = 0.85.
• At each temperature, try kn moves (k = 5–10).
• Terminate the annealing process if
– # of accepted moves < 5%,
– temperature is low enough, or
– run out of time.
M1 Move
Swap module 3 and 7 in P1 = 25V1H374VH6V8VH
We get: P2 = 25V1H734VH6V8VH
Area changed from 11 × 15 to 13 × 14
Change on Floorplan
M2 Move
Complement last chain in P2 = 25V1H734VH6V8VH
We get: P3 = 25V1H734VH6V8HV
Area changed from 13 × 14 to 15 × 11
Change on Floorplan
M3 Move
Swaps 6 and V in P3 = 25V1H734VH6V8HV
We get: P4 = 25V1H734VHV68HV
Area changed from 15 × 11 to 15 × 7
Change on Floorplan
a a e a e
d e d d
b b b
c f c f c f
a e a a
d d e d e
b b b
c f c f f
c
Loci of module b Positive loci: abdecf Negative loci: cbfade
Geometrical Information
• No pair of positive (negative) loci cross each other, i.e., loci are linearly
ordered.
• Sequence Pair (Γ+ , Γ− ): Γ+ is a module-name sequence representing
the order of positive loci. (Exp: (Γ+ , Γ− ) = (abdecf, cbf ade))
• x is after (before) x in both Γ+ and Γ− =⇒ x is right (left) to x.
• x is after (before) x in Γ+ and before (after) x in Γ− =⇒ x is below
(above) x.
a e a a
d d e d e
b b b
c f c f f
c
Loci of module b Positive loci: abdecf Negative loci: cbfade
46
(Γ+, Γ−)-Packing
• For every sequence pair (Γ+ , Γ− ), there is a (Γ+ , Γ− ) packing.
• Horizontal constraint graph GH (V, E) (similarly for GV (V, E)):
– V : source s, sink t, m vertices for modules.
– E: (s, x) and (x, t) for each module x, and (x, x ) iff x must be left-to
x .
– Vertex weight: 0 for s and t, width of module x for the other
vertices.
t
a a a
d e d e d e
s t
b b b
f f f
c c c
Transitive Reduction
a a
b c b c
d d
e e
47
a a a
d e d e d e
s t
b b b
f f f
c c c
Sequence Pair
• Final chip area?
• Solution space size?
– Without rotation vs with rotation
• Optimization: Simulated Annealing
– Initial solution: ī+ = ī-
– Swap two modules in ī+
– Swap two modules both in ī+ and ī-
– Rotate
• Results: produces highly packed non-slicing floorplans
48
m8
m11
m0 m2 m3
m4 m9 m4
4174 m3 m2
m11 m8 m13
m9 m13 2696 m6
m1 m12 m7 2580
m9
m5 m13
m6 m1 m8
m7 m6 m3 m2 m1
m10 m5
3111 3549 2814
(a) temperature: 2000 (b) temperature: 1000 (c) temperature: 20
area: 12985314 -26.3% area: 9568104 area: 7260120
-44.1%
Sequence Pair
• Floorplan (a)
– S1: 11 0 7 10 8 4 1 5 12 2 9 13 6 3
– S2: 7 10 6 1 11 5 4 0 13 12 9 2 3 8
• Floorplan (b)
– S1: 8 6 13 3 9 5 2 4 10 0 7 12 1 11
– S2: 5 6 2 1 8 9 13 4 12 10 11 0 3 7
• Floorplan (c)
– S1: 3 11 6 9 5 4 7 0 10 12 13 1 2 8
– S2: 1 6 8 12 3 7 5 10 0 9 11 13 4 2
49
Constraint Graphs
Horizontal constraint graph (HCG)
Before and after removing transitive edges
Final Floorplan
Dimension: 11 × 15
Move I
Swap 1 and 3 in positive sequence of SP1
SP1 = (17452638, 84725361)
SP2 = (37452618, 84725361)
Constraint Graphs
Constructing Floorplan
Dimension: 13 × 14
Move II
Swap 4 and 6 in both sequences of SP2
SP2 = (37452618, 84725361)
SP3 = (37652418, 86725341)
Constraint Graphs
Constructing Floorplan
Dimension: 13 × 12
Summary
Impact of the moves:
Floorplan dimension changes from 11 × 15 to 13 × 14 to 13 × 12
Placement
ECE6133
Physical Design Automation of VLSI Systems
Placement
• The process of arranging the circuit components on a layout surface.
• Inputs: A set of fixed modules, a netlist.
• Goal: Find the best position for each module on the chip according to
appropriate cost functions.
– Considerations: routability/channel density, wirelength, cut size,
performance, thermal issues, I/O pads.
D B C A
1 2 1 3
1
E F G H
5 5 6
3 5
2
Density = 2 (2 tracks required)
7 8
3 4 8
6 4
6 A B C D
4
8 7 2 7
E F G H
wirelength = 10 wirelength = 12
Shorter wirelength, 3 tracks required.
57
Placement Objectives
Total Number of Wire Signal
Wirelength Cut Nets Congestion Delay
e k f h
h a j
i e
c d c
vs.
l
f l i
b
b j
k g g
d a
Wirelength Estimation
6 h 5 LHPWL wh
1
3 w
Placement Methods
• Constructive methods
– Cluster growth algorithm
– Force-directed method
– Algorithm by Goto
– Min-cut based method
• Iterative improvement methods
– Pairwise exchange
– Simulated annealing: Timberwolf
– Genetic algorithm
• Analytical methods
– Gordian, Gordian-L
Min-Cut Placement
• Breuer, “A class of min-cut placement algorithms,” DAC-77.
• Quadrature: suitable for circuits with high density in the center.
• Bisection: good for standard-cell placement.
• Slice/Bisection: good for cells with high interconnection on the periphery.
3a 1
3a 2a 2
3b 3
1 1 4
3c 5
3b 2b 6
3d 7
4a 2 4b 6a 5a 6b 4 6c 5b 6d 10a 9a10b8 10c 9b 10d
n/2 C2
n/2 n/4 n/k n/k
n/4 C2
C1
C1 C1 n/4 n/k
n/4 C2
n/2 n/2 n/2
(k−1)n/k (k−2)n/k
quadrature bisection slice/bisection
59
S S
L1 L1 R1
R
L2 L2 R2
Terminal Propagation
• We should use the fact that s is in L1 !
center dummy cell
L1
s L1 s p
R1 R1
p
L2 R2 L2 R2
• When not to use p to bias partitioning? Net s has cells in many groups?
minimum rectilinear
Steiner tree
p p2
p1
p
R
h h/3 h h/3
L
p3
Don’t use p to bias the
solution in either direction! Use p! G
60
c
d c d
C1 C1 C1 C1
p1 c b
a b a b L1 a
L1 R1
b R1
L R L R
L2
c d c d L2 c d R2 a d R2
Creating Rows
• Terminal propagation reduce overall area by ~30%
• Creating rows
– Choose Į and ȕ preferably to balance row to balance row length
(during re-arrangement )
Creating Rows
• Example
– Partitioning of circuit into 32 groups
– Each group is either assigned to a single row or divided into 2 rows
1 1 1 1,2
1,2 1,2
1,2 2
2 2,3
2,3
2,3 a four-row
2,3 standard cell
3 3
3 design
3,4 3,4 3,4 3,4
4 4 4
4
5 5 4,5 4,5
5 5 5 5
Experimental Results
• CMOS Chip with 453 nets and 412 cells
• Manual solution
– track density=147; feedthroughs=184
• Automated solution
– without terminal propagation: t.d.=313; f.t.=591
– (t.d. reduced to 235 by iterative interchanges)
– with terminal propagation: t.d.=186; f.t.=182
– (t.d. reduced to 152 by iterative interchanges)
– Iterative Interchange Refinement is helpful
• The program is in production use as part of an automatic
placement system in AT&T Bell Lab.
– Solutions within 10% of the best hand layout
62
Mincut Placement
Perform quadrature mincut onto 4 × 4 grid
Start with vertical cut first
Recursive Bisection
Start with vertical cut
Perform terminal propagation with middle third window
Cut 8 to 15
16 partitions generated by 15 cuts
HPBB wirelength = 23
• Definition
– Process of solving optimization problems involving quadratic functions
– One seeks to optimize (minimize or maximize) a multivariate quadratic
function subject to linear constraints on the variables
– n-dimensional vector c
– n × n-dimensional real symmetric matrix Q
– m × n-dimensional real matrix A
– m-dimensional real vector b
65
Analytical Placement
• Gordian package:
– GORDIAN: Gordian: VLSI Placement by Quadratic
Programming and slicing Optimization: J. M. Kleinhans, G.Sigl,
F.M. Johannes, K.J. Antreich, IEEE TCAD, 1991
– GORDIAN-L: Analytical Placement: A Linear or a Quadratic
Objective Function?: G. Sigl, K. Doll, F.M. Johannes, DAC91
• Gordian: A Quadratic Placement Approach
– Global optimization: solves a sequence of quadratic programming
problems
– Partitioning: enforces the non-overlap constraints
Quadratic Placement
i=58 i=87
Adaptec1 Stats
• Circuit stats
– # cells/nets/pins 210,863/219,687/19,205
– chip size 6000um × 6000um
– bin size 50um × 50um
– # placement bins 120 × 120
– Average bin occupancy 210K/1202 =14.6 gates/bin
• Wirelength result (HPBB)
– iteration 0 34,069,060
– iteration 29 46,352,680
– iteration 58 80,783,336
– iteration 87 98,111,904
67
Problem Definition
connection to
y other modules
module u
lvu net node v
pin vu (xuv, yuv)
(xu, yu)
(avu, bvu) = offset from center of u
(xv, yv)
x
Squared wire length of net v
Lv ¦ [( x
uM v
uv xv ) 2 ( yuv yv ) 2 ]
Cost Function
• Minimize the following:
1
I ¦ Lv wv
2 vN
I ( x, y) X T CX d Tx X Y T CY d Ty Y
I ( x) X T CX d T X
Constraints
• The center of gravity constraints
– At level l, chip is divided into q (2l ) regions
– For region p, the center coordinates: (up, vp)
– Mp: set of modules in region p
– Matrix from for all regions
– Lastly, we have
ܨ / ܨ , if ݉ ܯ א
ݑ = ܺ ܣ, where ܽ = ൞ אெ
0 otherwise
69
Problem Formulation
(uȡ’, vȡ’)
D A B C D E F G
E
ª º
F U ««* * * 0 0 0 »
»
B A( l )
A U ' «0 0 0 * * * »
« »
C ¬ ¼
(uȡ, vȡ)
Hessian Matrix
concavity Laplacian
70
• Our Gordian QP
Gordian Laplacian
• Our Laplacian C
– C is positive definite if C’s eigenvalues are nonnegative
– C is positive definite if xTCx is positive
– C is positive definite if C is diagonal and the entries are positive
– So, C is positive definite
Partitioning
• Recursive partitioning is needed
– to resolve module overlap in global placement
– global placement problem will be solved again with two
additional center_of_gravity constraints
Cp(a)
M p o ( M p' , M p'' )
40
x u' d x u'' u' M p' and u' ' M p'' 30
D ¦F / ¦F
u ' M p '
u
u M p
u | 0.5 20
10
cut value : C p (D ) ¦w
v N C
v 0
Repartitioning
• Module exchange after each cut to improve cut size
– terminal propagation using global placement positions
• Repartitioning
– to ‘undo’ the mistake made at the previous level:
Procedure repartition(l)
if overlap exists
for each rR(l-1)
merge-regions(r, r’, r’’);
partition(r, r’, r’’);
setup-constraints(l);
global-optimize(l);
endif
72
Summary of Gordian
module coordinates
Global Partitioning of
Optimization module set and
minimization of dissection of
wire length placement region
position constraints
Regions
module with d k
coordinates modules
Final
Placement
adoption of style
dependent
constraints
Experimental Results
Comparison of Results for Standard Cell Blocks
GORDIAN Placement
Perform GORDIAN placement
Uniform area and net weight, area balance factor = 0.5
Undirected graph model: each edge in k-clique gets weight 2/k
IO Placement
Necessary for GORDIAN to work
Adjacency Matrix
Shows connections among movable nodes
Among nodes a to j
Degree Matrix
Based on both adjacency and pin connection matrices
Sum of entries in the same row (= node degree)
Laplacian Matrix
Degree matrix minus adjacency matrix
Level 0 QP Formulation
No constraint necessary
Level 0 Placement
Cells with real dimension will overlap
Level 1 Partitioning
Perform level 1 partitioning
Obtain center locations for center-of-gravity constraints
Level 1 Constraint
Level 1 Placement
Verification
Verify that the constraints are satisfied in the left partition
Level 2 Partitioning
Add two more cut-lines
This results in p1={c,d}, p2={a,b,e}, p3={g,j}, p4={f,h,i}
FKLSKHLJKWLV
ZHVSOLWFHOOVLQWRUDWLR
Level 2 Constraint
Level 2 Placement
Clique-based wiring is shown
Summary
Center-of-gravity constraint
Helps spread the cells evenly while monitoring wirelength
Removes overlaps among the cells (with real dimension)
Steiner Routing
ECE6133
Physical Design Automation of VLSI Systems
Width Pitch R C
Dir.
(um) (um) (ohm/um) (fF/um)
M1 0.05 0.135 V M1 7.24 0.172
M2 0.05 0.100 H M2 9.05 0.175
M3 0.05 0.100 V M3 9.06 0.181
M4 0.05 0.100 H M4 9.05 0.177
M5 0.05 0.100 V M5 9.06 0.180
M6 0.05 0.100 H M6 9.05 0.177
85
Full-Chip Routing
3/11
M1 M2 M3
Full-Chip Routing
4/11
M4 M5 M6
86
yellow: signal
M2 Layer
6/11
yellow: signal
magenta: clock, red: power/ground
87
M3 Layer
7/11
yellow: signal
magenta: clock
M4
8/11
yellow: signal
magenta: clock
88
M5
9/11
yellow: signal
magenta: clock, red: power/ground
M6
10/11
yellow: signal
cyan: power/ground
89
M7 and M8
11/11
magenta: power/ground
Routing
placement
global routing
Global routing
detailed routing
Routing Constraints
• 100% routing completion + area minimization, under a set of constraints:
– Placement constraint: usually based on fixed placement
– Number of routing layers
– Geometrical constraints: must satisfy design rules
– Timing constraints (performance-driven routing): must satisfy delay
constraints
– Crosstalk?
– Process variations?
w
s
d
a b d
a b
c
c
91
Global-Routing Problem
• Given a netlist N={N1 , N2 , . . . , Nn}, a routing graph G = (V, E), find a
n tree Ti for each net Ni, 1 ≤ i ≤ n, such that U (ej ) ≤ c(ej ), ∀ej ∈ E
Steiner
and i=1 L(Ti ) is minimized,
where
– c(ej ): capacity of edge ej ;
– xij = 1 if ej is in Ti; xij = 0 otherwise;
n
– U (ej ) = i=1 xij : # of wires that pass through the channel corre-
sponding to edge ej ;
– L(Ti): total wirelength of Steiner tree Ti.
• For high-performance, the maximum wirelength (maxni=1 L(Ti)) is mini-
mized (or the longest path between two points in Ti is minimized).
Spanning Tree
Problem Formulation:
Given a graph = , select a subset G V E V
0
V ,
~
such that has property P .
V
0
the edge i. e
Steiner Trees
1. Problem formulation:
Given an edge weighted graph = and a subset G V E D V ,
select a subset , such that
V and
0
V D V
0
V
0
4 5 C
7 D 6
5
6 8 E
6
12
J 2 3
9 F
6 H 5 5
5
I 6 G
Demand Point
(a) (b)
+DQDQ
V7KP
7KHUHH[LVWVDQ
RSWLPDO567ZLWKDOO
6WHLQHUSRLQWVVHW
6FKRVHQIURPWKH
LQWHUVHFWLRQSRLQWV
RIKRUL]RQWDODQG
YHUWLFDOOLQHVGUDZQ
IURPSRLQWVRI'
94
(a) (b)
(c) (d)
+ZDQJ
V7KP
7KHUDWLRRIWKHFRVW
RIDUHFWLOLQHDU067
WRWKDWRIDQRSWLPDO
567LVQRJUHDWHU
WKDQ
(e)
1-Steiner by Kahng/Robins
Iterative 1-Steiner Insertion Algorithm
Keep adding 1-Steiner point one-by-one until no more gain
before
insertion
before
insertion
before
insertion
before
insertion
• Random variant
– Instead of choosing the best gain Steiner point in each iteration, just pick
the first one found.
– Time spent on each step is less, but more Steiner points need to be added.
1-Steiner by Borah/Owens/Irwin
Interesting Observation
Gain Computation
Things to do
Overall Algorithm
Multi-pass Heuristic
Entire algorithm can be repeated
l(a,c) í l(p,a) = 4 í 2
l(b,d) í l(p,d) = 5 í 4
l(c,e) í l(p,e) = 4 í 3
l(b,c) í l(p,c) = 4 í 3
l(b,c) í l(p,e) = 4 í 3
l(b,c) í l(p,b) = 4 í 3
l(b,d) í l(p,d) = 5 í 4
l(e,f) í l(p,f) = 3 í 2
l(c,e) í l(p,c) = 4 í 3
Summary
Max-gain pair table
Sort based on gain value
Comparison
Kahng/Robins vs Borah/Owens/Irwin
Kahng/Robins tends to give better results
Borah/Owens/Irwin runs much faster: O(n4 log n) vs O(n2)
Radius vs Wirelength
BPRIM Under İ =
Radius bound =
= regular PRIM
Comparison
As the bound increases (12 ĺ 18 ĺ )
Radius value increases (12 ĺ17 ĺ 22)
Wirelength decreases (56 ĺ 49 ĺ 36)
Multi-net Routing
ECE6133
Physical Design Automation of VLSI Systems
Global Routing
Detailed Routing
we assume
two metal
layers
(H and V)
in this case
118
Routing Models
• Grid-based model:
– A grid is super-imposed on the routing region.
– Wires follow paths along the grid lines.
• Gridless model:
– Any model that does not follow this “gridded” approach.
grid−based gridless
119
– Two-layer: HV (horizontal-Vertical), VH
track 2 track 3
track 1 track 2
track 1
track 1
unreserved layer model
HVH model VHV model
0 1 4 5 1 6 7 0 4 9 10
netlist:
23535268987
lower boundary
2 3 5 3 5 2 6 8 9 8 7
local 1 3 5 5 4 3 3 3 4 3 2
density
terminals
upper boundary
vias
dogleg branches
lower boundary
trunks
4
0 1 2 5 3 4 0 0 2 3 2
3
4
0 1 2 5 3 4 0 0 2 3 2
3
1 begin
2 U ← {I1 , I2 , . . . , In};
3 t ← 0;
4 while (U = ∅) do
5 t ← t + 1;
6 watermark ← 0;
7 while (there is an Ij ∈ U s.t. sj > watermark) do
8 Pick the interval Ij ∈ U with sj > watermark,
nearest watermark;
9 track[j] ← t;
10 watermark ← ej ;
11 U ← U − {Ij };
12 end
0 2 1 3 0 0 5 0 6 0 5 0
density: 1 2 2 2 3 3 3 3 3 3 2 1
123
1 begin
2 U ← {I1 , I2 , . . . , In};
3 t ← 0;
4 while (U = ∅) do
5 t ← t + 1;
6 watermark ← 0;
7 while (there is an unconstrained Ij ∈ U s.t. sj > watermark) do
8 Pick the interval Ij ∈ U that is unconstrained,
with sj > watermark, nearest watermark;
9 track[j] ← t;
10 watermark ← ej ;
11 U ← U − {Ij };
12 end
2 5 0 5 5 3 3 0 6 0 4
1 4
2 2
5 6 5 5
3 3 3 3
track 1 track 2 track 3 track 4
124
c-1
a b c d d a b c d d
a
a c-2
b
b d
c
c-1
d
HDoglegs break cycles in VCG
b a b a b-1
a b-1
?
b a
b-2
a b a b b-2
a a a
a a
125
Detailed Routing
Dogleg Router
Drawback of LEA: the entire net is on a single track.
~ Doglegs are used to place parts of a net on dierent
tracks, thereby minimizing channel height.
1 1 2 3 2
2 (a) 3
1 1 2 3 2
2 (b) 3
Detailed Routing
Dogleg Router
Each Multi-terminal net is broken into a set of two-terminal nets.
Two parameters are used to control routing:
1. range: Determine the number of consecutive two-terminal
~ subnets of the same net that can be placed on the same track.
2. routing sequence: Speci es the starting position and the direction
of routing along the channel.
Modi ed LEA is applied to each subnet.
0 1 2 2 4 3 0 0
0 1 2 2 4 3 0 0
1 2 0 3 3 0 4 4
1 2 0 3 3 0 4 4
(a) (b)
~ Example of Dogleg Router
~
Algorithms for VLSI
Deutsch ICCADDesign
Physical 1985 Automation 7.32 j
c Sherwani 92
126
Final solution
*
1
1
1
1 19
ė 1
1 1
1
1
ė 1
1
1!""1
1
ė "# 1$$1!1
1$
1"
1% !
1"$$1
1 &
ė '
1
1"((1 1
11"&") 1
1$
127
31
1,
1$
"
+
,
1--1
111111
ė 1
1 1
1
1
1 1.((1/001
11
1
1 1
1
1!" "1
1 !1
31
1,
1$
"
"19
4
,
1+--111111
1
1$
1
((1
1
1
1
1
ė #
#14--167 1 % 13#
#1
1:$13
1
13
1$ 1!!1 #19
ė #
#14;--1
1:$1
1
1$
$19 1
1<1 1$ $19
ė #
#14=--1/
1
13167 1, 1'! % 13# #1>1
"
"1
ė #
#14?--1@&
13
19 &1"
128
41$
"
"1,"
;
!
ė 1
11
1
1
ė 1 1117 71
1
11
1
1 1
11
1
% " 117%
41$
"
"1,"
19
=
"
ė 3
1 111"7
1
13A$331
1
1"""*A$* $
ė 1
$
1 $
$1
1
1"% 1
!!1
1
1
1#
1"
1
#% 1
ė 3
1% 1 --11B;((11
11
111
1
% 1
11
1%% CC1
129
41$
"
"1,"
19
?
ė 1 1
1$
$1!!1
1 &1# 1! 1
1
1" 111% 1 1
1##1!!1
1
19 1
1" 17((1
1
1 1
1
1# 11
1%" "1!!1
1
1% ! 1" 1771D
D1
* *CC1
ė $
$1
1
1
!
1"
ė
1
1
1
1 "$$1
ė 3#1 --1E
ė 1$ 1
1
1
1"% 1
!!1"
1
1"*#
1 1
$
1
7
#
#14
19B?
F
67
1 %
13#
#1
1"
"1
1
16"16
$1
1711G9311G9C
#
#14
19+B?
67
1 %
13#
#1
1"
"1
1
16"16
17C
#
#14
194B?
67
1 %
13#
#1
1"
"1
1
16"16
17C
131
#
#14
19;B?
E
67
1 %
13#
#1
1"
"1
1
16"16
#
#14
19=B?
67
1 %
13#
#1
1"
"1
1
16"16
'!13
11 1%
1) (1
113%$131%
1 C1
(1!13 ⴋ1
1
1
1 $" 11
#
#14
19?B?
+
67
1 %
13#
#1
1"
"1
1
16"16
#
#14+
+19B;
4
18#
#1
16
137
1
1,%
1
1#$
$1 #19
#
#14+
+19+B;
;
18#
#1
16
137
1
1,%
1
1#$
$1 #19
#
#14+
+194B;
=
18#
#1
16
137
1
1,%
1
1#$
$1 #19
1
71!1 1 1
3!
3C13
1# 1
!
11 11!111
1#
$$1
1!11
11
111#1%1"
C
134
#
#14+
+19;B;
?
18#
#1
16
137
1
1,%
1
1#$
$1 #19
#
#14
F
1:$
13
1
13
1$
1!!1 #19
1
1$ 1!171
$
11
1 1%1
#
#1;
1:$
1
1
1$
$19
1
1<1 1$
$19
311
11 1A$11
" 1
1 111 "#1
711
111 11
#% 111$ 1
$ C
#
#1=
/
1
13
167
1,
1'! %
13#
#1>1"
"1
#
#1?
+E
@&
13
19 &1"
1
1 11
1
'G91H11
11
11#1
! 1"1(1"7 1G91% 1
1 "#1 C193
1$1
!1
1 1111
!
C13
1!1
171
11
111G91!1" 1(1
&
1
13
$$1
31!1
11$17111
"1D11
1### 1
)11$C
+