PDA 수업자료

1
1IZTJDBM%FTJHOᖅĥ ᯱ࠺⪵
᦭Łญ᷹ ᯦ྙ
ᯥᖒȽ Ʊᙹ
᳑ḡᦥ Ŗ‫ ݡ‬ᱥᯱ⍕⥉░ ⦺ᇡ
MJNTL!FDFHBUFDIFEV
*%&$‫ݡ‬ᱥ ֥ ᬵ ˀ ᯝ
v᮹ ĥ⫮ 2
֥ ᬵ ᯝ ᬵ
ˀ ᯝ ɩ
v᮹ ᜽e
_ ᜽e
v᮹ ‫ݡ‬ᔢ ⦺ᇡ ‫⦺ݡ‬ᬱ ᯝၹᯙ

v᮹ ᙹᵡ ᵲɪŁɪ
v᮹ ⩶┽ ᪉௝ᯙ ᵭ [PPN
ᯕᬊ
ᝅ᜖ ࠥǍ ᨧᮭ
Sung Kyu Lim, Georgia Tech (2022)

2
v᮹ ĥ⫮ ĥᗮ
3
ॵḡ▙ ⫭ಽ ᖅĥ᮹ ⦥ᙹ ŝᱶᯙ QIZTJDBMEFTJHOᨱ

ᔍᬊࡹ۵ ᖅĥ ᯱ࠺⪵ ᦭Łญ᷹ᮥ ႑ᬕ݅1IZTJDBM
EFTJHO᦭Łญ᷹ᮥ ⃹ᮭ ᱲ⦹۵ ᙹvᔾॅᮥ ‫ݡ‬ᔢᮝಽ
v᮹ ༊⢽ ⦹໑ ᨕಅᬕ ᦭Łญ᷹ ᯕು᮹ ᯕ⧕ෝ ࠶۵ äᨱ Ⅹᱱᮥ
฿⇹݅ᯕෝ ᭥⧕ e݉⦽ ᩩॅᮥ ᱢɚ ⪽ᬊ⦹ᩍ
᦭Łญ᷹᮹ ᱥℕ ⮱෥ŝ b ݉ĥ᮹ ḥ⧪ ŝᱶᮥ ᦭ʑ
ᛞí ᯕ⧕⦽݅
v᮹ ĥ⫮ ĥᗮ
4
ॵḡ▙ ⫭ಽ ᖅĥᨱᕽ QIZTJDBMEFTJHO᯲ᨦᮡ ⧊ᖒᯕ

Ҿӽ OFUMJTUෝ ႑ᩕ⦹Ł QMBDFNFOU
႑ᖁ⦹۵
SPVUJOH
ŝᱶᯕ݅ᯕ ŝᱶᮡ ⃹ญ⦹۵ DFMMŝ OFU᮹
ᚌᯱಽ ᯙ⧕ ᙹ႒ ᙹ⃽อ}
ᙹ᯲ᨦᯕ ᇩa‫⦹܆‬ᩍ
ၹऽ᜽ ᯱ࠺⪵ ᦭Łญ᷹ᮥ ᔍᬊ⦹ᩍ ḥ⧪⦽݅
v᮹ }᫵
ᅙ v᮹ᨱᕽ۵ ᔢᬊ⪵ ⚕ᯙ $BEFODFᔍ᮹ *OOPWVTӹ
4ZOPQTZTᔍ᮹ *$$PNQJMFSᨱ ᥑᯕ۵ ⧖ᝍ QIZTJDBM
EFTJHO᦭Łญ᷹ॅᮥ ႑ᬕ݅1IZTJDBMEFTJHO᮹ ⅾ
݅ᖐ }᮹ ŝᱶ ᷪ QBSUJUJPOJOH GMPPSQMBOOJOH
QMBDFNFOU HMPCBMSPVUJOH əญŁ EFUBJMFE
SPVUJOHᨱ ձญ ⪽ᬊࡹ۵ ᦭Łญ᷹ॅᮥ ‫ݡ‬ᔢᮝಽ ⦽݅

3
v᮹ ĥ⫮ ĥᗮ
5
ᵝᱽ1BSUJUJPOJOH
ᯝ₉ 1IZTJDBM%FTJHOᗭ}
,FSOJHIBO-JO᦭Łญ᷹
'JEVDDJB.BUUIFZTFT ᦭Łญ᷹
ᵝᱽ'MPPSQMBOOJOH
ᯝ₉ 1PMJTI&YQSFTTJPO᦭Łญ᷹
4FRVFODF1BJS᦭Łญ᷹
ᵝᱽ1MBDFNFOU
ᯝ₉ .JODVU 1MBDFNFOU᦭Łญ᷹
(PSEJBO1MBDFNFOU᦭Łญ᷹
v᮹ ĥ⫮ ĥᗮ
6
ᵝᱽ(MPCBM3PVUJOH
ᯝ₉ 4UFJOFS3PVUJOH᦭Łญ᷹
#PVOEFE3BEJVT3PVUJOH᦭Łญ᷹
ᵝᱽ%FUBJMFE3PVUJOH
ᯝ₉ $IBOOFM3PVUJOH᦭Łญ᷹
4XJUDICPY3PVUJOH᦭Łญ᷹

4
Ŗḡ ᔍ⧎ 7
ˍ v᮹ ᯱഭ ˍ v᮹ ᵲ ḩྙ
ˀ 1%'ᯱഭ Ŗᮁ⧉ ˀ ;PPN ₥❦₞ ᯕᬊ
ˍ ࠺ᩢᔢ ‫⪵ך‬
ˀ ɩḡ⧉ᮥ ᧲⧕ ᇡ┢ऽพ‫݅ܩ‬
ˍ ᙹᨦ ႊ᜾
ˀ ๅ ᜽e ᇥ ᙹᨦ ᇥ ⮕᜾
ˍ vᔍ᮹ ᵭ ᱲᗮ ྙᱽ᜽
ˀ ᯱญෝ ᯕ┩⦹ḡ ัŁ ‫ݡ‬ʑ
᫵฾
vᔍ ᗭ} 8
ˍ ᯥᖒȽ
ˀ 6$-"
ˀ ᳑ḡᦥ Ŗ‫ ݡ‬ᱥᯱ ⍕⥉░ Ŗ⦺ŝ

ˀ ⩥ᰍ
ˀ ᩑǍᇥ᧝"*₉ᬱ ၹࠥℕ ⫭ಽ
ᖅĥ ၰ ᖅĥ ⚕ }ၽ
ˀ *%&$v᮹ ⩥ᰍ
ˀ ֥ *%&$ᬑᙹvᔍᔢ ᙹᔢ

5
1IZTJDBM%FTJHO"VUPNBUJPO 9
under the hood
ᱽ ₦
10

6
ᱽ ᙹᨦ 11
ˍ ֥ ᇡ░
– http://limsk.ece.gatech.edu/course/ece6133/
ᱽ ᙹᨦ 12
ˍ -BSHFTU QMBDFSPVUFDPVSTFJOUIFXPSME
ˀ 0GGFSFEFWFSZZFBSBUHSBEVBUFMFWFM
ˀ $PWFSTQIZTJDBMEFTJHOPOMZ
ᅥ⦺ʑ ໦
v᮹ ⠪a

7
3FGFSFODFT 13
ˍ 1SBDUJDBM1SPCMFNTJO7-4*1IZTJDBM%FTJHO
"VUPNBUJPO
ˀ 4VOH,ZV-JN
ˍ 7-4*1IZTJDBM%FTJHO'SPN(SBQI1BSUJUJPOJOHUP
5JNJOH$MPTVSF
ˀ "OESFX#,BIOH +FOT-JFOJH *HPS-.BSLPW +JO)V
Introduction
ECE6133
Physical Design Automation of VLSI Systems
Prof. Sung Kyu Lim

School of Electrical and Computer Engineering
Georgia Institute of Technology
8
VLSI Design Flow

System Specification
Partitioning
Architectural Design
ENTITY test is
port a: in bit;
end ENTITY test;
Functional Design Chip Planning
and Logic Design
Circuit Design Placement
Physical Design
Clock Tree Synthesis
Physical Verification
DRC and Signoff
LVS Signal Routing
ERC
Fabrication
Timing Closure
Packaging and Testing
Chip
Vdd Contact
Metal layer
Vdd IN2 Poly layer
IN2
IN1 OUT Diffusion layer
OUT
IN1 p-type
transistor
n-type
GND
transistor
GND
IN1
OUT
IN2 Power (Vdd)-Rail
Ground (GND)-Rail
9
Matrix Solver (20K)

• Cadence Encounter: placement (1 sec), routing (12 sec)
– Area = 72x72um (45nm library), used 6 metal layers
Matrix Solver (20K)
M1 M2 M3
M4 M5 M6
10
Matrix Solver (20K)

• GDSII shots: manufacturing-ready
– Used Cadence Virtuoso, passed DRC
Matrix Solver (20K)

• GDSII shots: manufacturing-ready
– Specify all intra-cell details
11
MAC Unit (267K)

• Placement took 44 sec, routing took 289 sec
– Area = 320x320um, used 7 metal layers
MAC Unit (267K)
M1 M2 M3
M4 M5 M6
12
MAC Unit (267K)
M7
32-bit Processor (2.7M)

• Placement took 739 sec, routing took 4740 sec
– Area = 1000x1000um, used 10 metal layers
13
M1 M2 M3
M4 M5 M6
M7 M8 M9
M10
14
Placement Comparison
• Runtime: 1 sec vs 44 sec vs 739 sec
20K 267K 2.7M
Routing Comparison
• Runtime: 12 sec vs 289 sec vs 4740 sec
20K 267K 2.7M

15
Partitioning
ECE6133
Prof. Sung Kyu Lim

Partitioning
Partitioning
System design
Decomposition of a complex system into smaller subsystems.
Each subsystem can be designed independently speeding up

the design process.
Decomposition scheme has to minimize the interconnections

between the subsystems.
Decomposition is carried out hierarchically until each

subsystem is of managable size.
Module 1 Module 2 Module 3 Module n Interface

Information
Algorithms for VLSI Physical Design Automation 4.1 j

c Sherwani 92
16
Partitioning
Partitioning of A Circuit
Input size = 48
(a)
~
(b)
Cut 1 = 4 Cut 2 = 4
Size 1 = 15 Size 2 = 16 Size 3 = 17

c Sherwani 92
Partitioning
Partitioning at dierent levels

Partitioning
System Board Chip

Level Level Level
System
Board
System
Level
Chip Board Board Board

Level Level Level
Chip Chip Chip

Level Level Level

c Sherwani 92
17
Partitioning Methods
• Top-down Partitioning (cutsize only)
– Iterative improvement [KL70, FM82, Kr84, San89]
– Spectral based [HK92, AZ95]
– Clustering method [SU72, NOP87, WC92, SS93, CS93, HK95]
– Network flow based [YW94, YW97]
– Analytical based [RDJ94, LLC95]
– Multi-level [CS93, HB95, AHK97, KA+97, KK99]
• Bottom-up Clustering (delay only)
– Unit delay model [LLT69, CD93]
– General delay model [MBV91, RW93, YW95]
– Sequential circuits with retiming [PKL98, CLW99, CL00]
Partitioning
Kernighan-Lin Algorithm
It is a bisectioning algorithm
The input graph is partitioned into two subsets of equal sizes.
Till the cutsize keeps improving,
Vertex pairs which give the largest decrease in cutsize
~
are exchanged
These vertices are then locked
If no improvement is possible and some vertices are still
unlocked, the vertices which give the smallest
increase are exchanged
~ W. Kernighan and S. Lin, Bell System Technical Journal, 1970.

c Sherwani 92
18
Partitioning
Algorithm KL
begin
INITIALIZE
while IMPROVEtable = TRUE do
* if an improvement has been made during last iteration,
the process is carried out again. *
while UNLOCKA = TRUE do
* if there exists any unlocked vertex in A,
more tentative exchanges are carried out. *
for each a 2 A do
if a = unlocked then
for each b 2 B do
~
if b = unlocked then
if Dmax Da + Db then
Dmax = Da + Db
amax = a
bmax = b
TENT-EXCHGEamax bmax
LOCKamax bmax
LOGtable
Dmax = 1
ACTUAL-EXCHGEtable
end.

c Sherwani 92
Perform single KL pass on the following circuit:
KL needs undirected graph (clique-based weighting)
Practical Problems in VLSI Physical Design KL Partitioning (1/6)

19
First Swap
Second Swap

20
Third Swap
Fourth Swap
Last swap does not require gain computation

21
Summary
Cutsize reduced from 5 to 3
Two best solutions found (solutions are always area-balanced)
Partitioning
Drawbacks of K-L Algorithm
K-L algorithm considers balanced partitions only.

As vertices have unit weights, it is not possible to
~ allocate a vertex to a partition.
The K-L algorithm considers edges instead of hyperedges.
High, On3 complexity.

c Sherwani 92
22
Partitioning
Fiduccia-Mattheyses Algorithm
This algorithm is a modied version of Kernighan-Lin Algorithm.

A single vertex is moved across the cut in a single move which
permits handling of unbalanced partitions.
The concept of cutsize is extended to hypergraphs.
~
Vertices to be moved are selected in a way to improve
time complexity.
A special data structure is used to do this.
Overall time complexity of the algorithm is On2.
C. M. Fiduccia and R. M. Mattheyses, 19th DAC, 1982.

c Sherwani 92
Partitioning
Data Structure Used in Fiduccia-Mattheyses Algorithm

+pmax
Ist Partition
Vertex # Vertex #
-pmax
Vertex
......... n List of free
1 2 vertices
+pmax IInd Partition
Vertex # Vertex #
-pmax
Vertex
1 2 ......... n

c Sherwani 92
23
Fiduccia-Mattheyses Algorithm
Perform FM algorithm on the following circuit:
Area constraint = [3,5]
Break ties in alphabetical order.
Practical Problems in VLSI Physical Design FM Partitioning (1/12)
Initial Partitioning
Random initial partitioning is given.

24
Gain Computation and Bucket Set Up
First Move

25
Second Move
Third Move

26
Forth Move
Fifth Move

27
Sixth Move
Seventh Move

28
Last Move
Summary
Found three best solutions.
Cutsize reduced from 6 to 3.
Solutions after move 2 and 4 are better balanced.

29
Floorplanning
ECE6133
Prof. Sung Kyu Lim

Intel i7 Skylake Floorplan (14nm, 2015)

30
Floorplanning, Placement, and Pin Assignment

• Partitioning leads to
– Blocks with well-defined areas and shapes (fixed blocks).
– Blocks with approximated areas and no particular shapes (flexible
blocks).
– A netlist specifying connections between the blocks.
• Objectives
– Find locations for all blocks.
– Consider shapes of flexible block, pin locations of all the blocks.
Blocks w/ areas Block locations
(shapes)
netlist
netlist
Partitioning Floorplanning/Placement Routing

(/Pin assignment)
Floorplanning
• Inputs to the floorplanning problem:

– A set of blocks, fixed or flexible.
– Pin locations of fixed blocks.
– A netlist.
• Objectives: Minimize area, reduce wirelength for (critical) nets, max-
imize routability, determine shapes of flexible blocks
7 5 5
7 3
4
6 6 4
2
1 3 1 2
An optimal floorplan,
in terms of area A non−optimal floorplan
31
Floorplan Design
x
Modules: y
Area: A=xy
g e Aspect ratio: r <= y/x <= s
d
Rotation:
f
Module connectivity
b
c a 2 b
a
3 1 3 6
c 5 d
5
2
e f
Floorplanning: Terminology
• Rectangular dissection: Subdivision of a given rectangle by a finite #
of horizontal and vertical line segments into a finite # of non-overlapping
rectangles.
• Slicing structure: a rectangular dissection that can be obtained by
repetitively subdividing rectangles horizontally or vertically.
• Slicing tree: A binary tree, where each internal node represents a vertical
cut line or horizontal cut line, and each leaf a basic rectangle.
• Skewed slicing tree: One in which no node and its right child are the
same.
V V
3 3 H H H H
1
1 5
4 4 5 2 1 H 3 2 1 V H
2 V V 6 7 V 3
2 6 7 6 7
6 7 4 5 4 5
Non−slicing floorplan Slicing floorplan A slicing tree (skewed) Another slicing tree
(non−skewed)
32
Solution Representation
• An expression E = e1 e2 . . . e2n−1 , where ei ∈ {1, 2, . . . , n, H, V }, 1 ≤ i ≤
2n − 1, is a Polish expression of length 2n − 1 iff
1. every operand j, 1 ≤ j ≤ n, appears exactly once in E;
2. (the balloting property) for every subexpression Ei = e1 . . . ei, 1 ≤
i ≤ 2n − 1, #operands > #operators.
1 6 H 3 5 V 2 H V 7 4 H
# of operands = 4 ....... = 7
# of operators = 2 ....... = 5
• Polish expression ←→ Postorder traversal.
• ijH: rectangle i on bottom of j; ijV : rectangle i on the left of j.
V
7 5 H H
4 V V
3 4
H
6 2 7 5
2 1 6
1 3 E = 16H2V75VH34HV
E = 16+2*75*+34+*
Postorder traversal of a tree!
Solution Representation (cont’d)

V V
1 V V 4
3 H 4 1 H
1 4 2 3 2 3
2 E = 123H4VV E = 123HV4V
non−skewed! skewed!
H V
Non−skewed H V
cases
....... HH ........ ....... VV ........
• Question: How to eliminate ambiguous representation?

33
Normalized Polish Expression
• A Polish expression E = e1 e2 . . . e2n−1 is called normalized iff E has no

consecutive operators of the same type (H or V ).
• Given a normalized Polish expression, we can construct a unique rect-
angular slicing structure.
V
7 5 H H
4 V V 3 4
6 H 2 7 5
2 1 6
1 3 E = 16H2V75VH34HV
A normalized Polish expression
Area Computation
{ (5,5) (9,4) } 2 1 2
V
{ (2,5) (3,4) } H { (3,5) (6,,4) }

2 2 5 6
H
{ (6,2) (3,3) }
V V { (3,2) }
1 2
{ (2,3) (3,2) } { (2,2) } 5 6
3 1 3 4
3 4 { (1,2) (2,1) } { (2,2) }
{ (1,3) (3,1) } { (2,3) (3,2) }
u2 max{u1, u2} V
u1
v w v+w H H
V V
u2 1 2
u2 u1+u2 5 6
u1 u1 3 4
v w
max{v, w}
• Wiring cost?
34
Floorplan Design by Simulated Annealing
• Related work
– Wong & Liu, “A new algorithm for floorplan design,” DAC’86.
∗ Consider slicing floorplans.
– Wong & Liu, “Floorplan design for rectangular and L-shaped mod-
ules,” ICCAD’87.
∗ Also consider L-shaped modules.
– Wong, Leong, Liu, Simulated Annealing for VLSI Design, pp. 31–71,
Kluwer academic Publishers, 1988.
• Ingredients: solution space, neighborhood structure, cost function, an-
nealing schedule?
Annealing (Wikipedia)
• Annealing (metallurgy)
– a heat treatment that alters the microstructure of a material, causing
changes in properties such as strength, hardness, and ductility
• Simulated annealing
– a numerical optimization technique for searching for a solution in a space
otherwise too large for ordinary search methods to yield results
35
Simulated Annealing Algorithm
• A random initial solution is available as the input

– A new solution is generated by making a RANDOM perturbation
– If the solution improves, the move is always accepted
– If not, the move is accepted with a probability that decreases with the
decrease in a parameter called “annealing temperature” T.
Kirkpatrick, Gelatt, Vecchi,

"Optimization by
Simulated Annealing".
Science, 1983.
Slicing Floorplanning Examples

1
• Modi and Gupta (Spring 2013)

– 5 blocks
before (area = 65) after (area = 40)
– 30 blocks

36

2

– 100 blocks (took 0.5min)

3

– 150 blocks (took 8min)

37

4
Neighborhood Structure
• Chain: HV HV H . . . or V HV HV . . .
1 6 H 3 5 V 2 H V 7 4 H V
chain
• Adjacent: 1 and 6 are adjacent operands; 2 and 7 are adjacent operands;

5 and V are adjacent operand and operator.
• 3 types of moves:
– M 1 (Operand Swap): Swap two adjacent operands.
– M 2 (Chain Invert): Complement some chain (V = H, H = V ).
– M 3 (Operator/Operand Swap): Swap two adjacent operand and
operator.
38
Effects of Perturbation
3
4
2 2
3 3 4
4 4
1 2 1 2 M2 1 1
M1 M3 3
12V4H3V 12V3H4V 12H3H4V 12H34HV
• Question: The balloting property holds during the moves?
– M 1 and M 2 moves are OK.
– Check the M 3 moves! Reject “illegal” M 3 moves.
• Check M 3 moves: Assume that the M3 move swaps the operand ei

with the operator ei+1 , 1 ≤ i ≤ k − 1. Then, the swap will not violate the
balloting property iff 2Ni+1 < i.
– Nk : # of operators in the Polish expression E = e1 e2 . . . ek , 1 ≤ k ≤ 2n − 1.
Cost Function
• Φ = A + λW .
– A: area of the smallest rectangle
– W : overall wiring length
– λ: user-specified parameter
3
4
2 2
3 3 4
4 4
1 2 1 2 M2 1 1
M1 M3 3
A: 12H34HV

• W = ij cij dij .
– cij : # of connections between blocks i and j.
– dij : center-to-center distance between basic rectangles i and j.

39
Incremental Computation of Cost Function
• Each move leads to only a minor modification of the Polish expression.

• At most two paths of the slicing tree need to be updated for each move.
V V
H H H H
V V
M1 V V
1 2 1 2
6
5 4 6
3
4 3 5
E = 12H34V56VHV E = 12H35V46VHV
Incremental Computation of Cost Function

(cont’d)
V H
H H H V
V V
M2 V H
1 2 1 2
5 6
5 6
3 4
3 4
E = 12H34V56VHV E = 12H34V56HVH
V V
H H 1 H
V V
M3 V V
1 2
5 6
5 6
3 4
H 4
2
E = 12H34V56VHV 3
E = 123H4V56VHV
40
Annealing Schedule
• Initial solution: 12V 3V . . . nV .

1 2 3 n
• Ti = r iT0 , i = 1, 2, 3, . . .; r = 0.85.
• At each temperature, try kn moves (k = 5–10).
• Terminate the annealing process if
– # of accepted moves < 5%,
– temperature is low enough, or
– run out of time.
Algorithm: Simulated Annealing Floorplanning(P, , r, k)

1 begin
2 E ← 12V 3V 4V . . . nV ; /* initial solution */
Δavg
3 Best ← E; T0 ← ln(P )
; M ← M T ← uphill ← 0; N = kn;
4 repeat
5 M T ← uphill ← reject ← 0;
6 repeat
7 SelectMove(M );
8 Case M of
9 M1 : Select two adjacent operands ei and ej ; N E ← Swap(E, ei , ej );
10 M2 : Select a nonzero length chain C; N E ← Complement(E, C);
11 M3 : done ← F ALSE;
12 while not (done) do
13 Select two adjacent operand ei and operator ei+1 ;
14 if (ei−1 = ei+1 ) and (2Ni+1 < i) then done ← T RU E;
15 N E ← Swap(E, ei , ei+1 );
16 M T ← M T + 1; Δcost ← cost(N E) − cost(E);
−Δcost
17 if (Δcost ≤ 0) or (Random < e T )
18 then
19 if (Δcost > 0) then uphill ← uphill + 1;
20 E ← N E;
21 if cost(E) < cost(best) then best ← E;
22 else reject ← reject + 1;
23 until (uphill > N ) or (M T > 2N );
24 T = rT ; /* reduce temperature */
25 until ( reject
MT
> 0.95) or (T < ) or OutOf T ime;
26 end
41
Normalized Polish Expression

Draw slicing floorplan based on:
Initial PE: P1 = 25V1H374VH6V8VH
Dimensions: (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
Practical Problems in VLSI Physical Design Polish Expression (1/8)
M1 Move
Swap module 3 and 7 in P1 = 25V1H374VH6V8VH
We get: P2 = 25V1H734VH6V8VH
Area changed from 11 × 15 to 13 × 14

42
Change on Floorplan
M2 Move
Complement last chain in P2 = 25V1H734VH6V8VH
We get: P3 = 25V1H734VH6V8HV

43
Change on Floorplan
M3 Move
Swaps 6 and V in P3 = 25V1H734VH6V8HV
We get: P4 = 25V1H734VHV68HV

44
Change on Floorplan
Sequence-Pair Based Floorplanning/Placement

• Murata, et al, ICCAD-95; Nakatake, et al, ICCAD-96; Murata, et al,
ISPD-97; Murata and Kuh, ISPD-98; Xu, et al, ISPD-98; Kang and Dai,
ISPD-98, ICCAD-98.
• Represent a packing by a pair of module-name sequences (e.g., (abdecf, cbf ade)).
• Correspond all pairs of the sequences to a P-admissible solution space.
• Search in the P-admissible solution space (typically, by simulated anneal-
ing).
a a e a e
d e d d
b b b
c f c f c f
A floorplan Loci of module b

45
Relative Module Positions

• A floorplan is a partition of a chip into rooms, each containing at most
one block.
• Locus (right-up, left-down, up-left, down-right)
1. Take a non-empty room.
2. Start at the center of the room, walk in two alternating directions to
hit the sides of rooms.
3. Continue until to reach a corner of the chip.
• Positive locus: Union of right-up locus and left-down locus.
• Negative locus: Union of up-left locus and down-right locus.
a e a a
d d e d e
b b b
c f c f f
c
Loci of module b Positive loci: abdecf Negative loci: cbfade
Geometrical Information
• No pair of positive (negative) loci cross each other, i.e., loci are linearly
ordered.
• Sequence Pair (Γ+ , Γ− ): Γ+ is a module-name sequence representing
the order of positive loci. (Exp: (Γ+ , Γ− ) = (abdecf, cbf ade))
• x is after (before) x in both Γ+ and Γ− =⇒ x is right (left) to x.
• x is after (before) x in Γ+ and before (after) x in Γ− =⇒ x is below
(above) x.
a e a a
d d e d e
b b b
c f c f f
c
Loci of module b Positive loci: abdecf Negative loci: cbfade
46
(Γ+, Γ−)-Packing
• For every sequence pair (Γ+ , Γ− ), there is a (Γ+ , Γ− ) packing.
• Horizontal constraint graph GH (V, E) (similarly for GV (V, E)):
– V : source s, sink t, m vertices for modules.
– E: (s, x) and (x, t) for each module x, and (x, x ) iff x must be left-to
x .
– Vertex weight: 0 for s and t, width of module x for the other
vertices.
t
a a a
d e d e d e
s t
b b b
f f f
c c c
Packing for sequence pair: Horizontal constraint graph s

(abdecf, cbfade) (Transitive edges are not shown) Vertical constraint graph
(Transitive edges are not shown)
Transitive Reduction
• Our HCG/VCG are DAG

– Longest path from the source in terms of # of hops
– Then remove the edges not on the longest paths
– This can be done in linear time! use topological sorting
a a
b c b c
d d
e e
47
Optimal (Γ+, Γ−)-Packing

• Optimal (Γ+ , Γ− )-Packing can be obtained in O(m2 ) time by applying
a longest path algorithm on a vertex-weighted directed acyclic graph.
– GH and GV are independent.
– The X and Y coordinates of each module are determined as the
minimum by assigning the longest path length between s and the
vertex of the module in GH and GV , respectively.
• The set of all sequence pairs is a P-admissible solution space.
t
a a a
d e d e d e
s t
b b b
f f f
c c c
Packing for sequence pair: Horizontal constraint graph s

(abdecf, cbfade) (Transitive edges are not shown) Vertical constraint graph
(Transitive edges are not shown)
Sequence Pair
• Final chip area?
• Solution space size?
– Without rotation vs with rotation
• Optimization: Simulated Annealing
– Initial solution: ī+ = ī-
– Swap two modules in ī+
– Swap two modules both in ī+ and ī-
– Rotate
• Results: produces highly packed non-slicing floorplans
48
Annealing Temperature vs. Floorplan Quality
m8
m11
m0 m2 m3
m4 m9 m4
4174 m3 m2
m11 m8 m13
m9 m13 2696 m6
m1 m12 m7 2580
m9
m5 m13
m6 m1 m8
m7 m6 m3 m2 m1
m10 m5
3111 3549 2814
(a) temperature: 2000 (b) temperature: 1000 (c) temperature: 20
area: 12985314 -26.3% area: 9568104 area: 7260120
-44.1%
Sequence Pair
• Floorplan (a)
– S1: 11 0 7 10 8 4 1 5 12 2 9 13 6 3
– S2: 7 10 6 1 11 5 4 0 13 12 9 2 3 8
• Floorplan (b)
– S1: 8 6 13 3 9 5 2 4 10 0 7 12 1 11
– S2: 5 6 2 1 8 9 13 4 12 10 11 0 3 7
• Floorplan (c)
– S1: 3 11 6 9 5 4 7 0 10 12 13 1 2 8
– S2: 1 6 8 12 3 7 5 10 0 9 11 13 4 2
49
Non Slicing Floorplan

• Sequence Pair + SA by Adam & Todd (class project)
Sequence Pair Representation

Initial SP: SP1 = (17452638, 84725361)
Dimensions: (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
Based on SP1 we build the following table:
Practical Problems in VLSI Physical Design Sequence Pair Method (1/13)

50
Constraint Graphs
Horizontal constraint graph (HCG)
Before and after removing transitive edges
Constraint Graphs (cont)

Vertical constraint graph (VCG)

51
Computing Chip Width and Height

Longest source-sink path length in:
HCG = chip width, VCG = chip height
Node weight = module width/height
Computing Module Location

Use longest source-module path length in HCG/VCG
Lower-left corner location = source to module input path length

52
Final Floorplan
Dimension: 11 × 15
Move I
Swap 1 and 3 in positive sequence of SP1
SP1 = (17452638, 84725361)
SP2 = (37452618, 84725361)

53
Constraint Graphs
Constructing Floorplan
Dimension: 13 × 14

54
Move II
Swap 4 and 6 in both sequences of SP2
SP2 = (37452618, 84725361)
SP3 = (37652418, 86725341)
Constraint Graphs

55
Constructing Floorplan
Dimension: 13 × 12
Summary
Impact of the moves:
Floorplan dimension changes from 11 × 15 to 13 × 14 to 13 × 12

56
Placement
ECE6133
Prof. Sung Kyu Lim

Placement
• The process of arranging the circuit components on a layout surface.
• Inputs: A set of fixed modules, a netlist.
• Goal: Find the best position for each module on the chip according to
appropriate cost functions.
– Considerations: routability/channel density, wirelength, cut size,
performance, thermal issues, I/O pads.
D B C A
1 2 1 3
1
E F G H
5 5 6
3 5
2
Density = 2 (2 tracks required)
7 8
3 4 8
6 4
6 A B C D
4
8 7 2 7
E F G H
wirelength = 10 wirelength = 12
Shorter wirelength, 3 tracks required.
57
Placement Objectives
Total Number of Wire Signal
Wirelength Cut Nets Congestion Delay
e k f h
h a j
i e
c d c
vs.
l
f l i
b
b j
k g g
d a
Wirelength Estimation
• Preferred method: Half-perimeter wirelength (HPWL)

– Fast (order of magnitude faster than RSMT)
– Equal to length of RSMT for 2- and 3-pin nets
– Margin of error for real circuits approx. 8% [Chu, ICCAD 04]
6 h 5 LHPWL wh
1
3 w
RSMT Length = 10 HPWL = 9

58
Placement Methods
• Constructive methods
– Cluster growth algorithm
– Force-directed method
– Algorithm by Goto
– Min-cut based method
• Iterative improvement methods
– Pairwise exchange
– Simulated annealing: Timberwolf
– Genetic algorithm
• Analytical methods
– Gordian, Gordian-L
Min-Cut Placement
• Breuer, “A class of min-cut placement algorithms,” DAC-77.
• Quadrature: suitable for circuits with high density in the center.
• Bisection: good for standard-cell placement.
• Slice/Bisection: good for cells with high interconnection on the periphery.
3a 1
3a 2a 2
3b 3
1 1 4
3c 5
3b 2b 6
3d 7
4a 2 4b 6a 5a 6b 4 6c 5b 6d 10a 9a10b8 10c 9b 10d
n/2 C2
n/2 n/4 n/k n/k
n/4 C2
C1
C1 C1 n/4 n/k
n/4 C2
n/2 n/2 n/2
(k−1)n/k (k−2)n/k
quadrature bisection slice/bisection
59
Min-Cut Placement with Terminal Propagation
• Dunlop & Kernighan, “A procedure for placement of standard-cell VLSI

circuits,” IEEE TCAD, Jan. 1985.
• Drawback of the original min-cut placement: Does not consider the
positions of terminal pins that enter a region.
– What happens if we swap {1, 3, 6, 9} and {2, 4, 5, 7} in the previous
example?
prefer to have them in R1
S S
L1 L1 R1
R
L2 L2 R2
Terminal Propagation
• We should use the fact that s is in L1 !
center dummy cell
L1
s L1 s p
R1 R1
p
L2 R2 L2 R2
Lower cost higher cost

P will stay in R1 for the rest of partitioning!
• When not to use p to bias partitioning? Net s has cells in many groups?
minimum rectilinear
Steiner tree
p p2
p1
p
R
h h/3 h h/3
L
p3
Don’t use p to bias the
solution in either direction! Use p! G
60
Terminal Propagation Example
• Partitioning must be done breadth-first, not depth-first.

a S
b
a S b
c
d c d
C1 C1 C1 C1
p1 c b
a b a b L1 a
L1 R1
b R1
L R L R
L2
c d c d L2 c d R2 a d R2
unbiased partition with terminal without terminal

of R propagation propagation
Creating Rows
• Terminal propagation reduce overall area by ~30%
• Creating rows
– Choose Į and ȕ preferably to balance row to balance row length
(during re-arrangement )
Row 1 C1 C2 C3 cells in C1o row1

Row 2
Row 3 cells in C3o row1
Į
Row 4 cells in C2 C2
ȕ
Row 1 Row 2 Į+ȕ=1
61
Creating Rows
• Example
– Partitioning of circuit into 32 groups
– Each group is either assigned to a single row or divided into 2 rows
1 1 1 1,2
1,2 1,2
1,2 2
2 2,3
2,3
2,3 a four-row
2,3 standard cell
3 3
3 design
3,4 3,4 3,4 3,4
4 4 4
4
5 5 4,5 4,5
5 5 5 5
Experimental Results
• CMOS Chip with 453 nets and 412 cells
• Manual solution
– track density=147; feedthroughs=184
• Automated solution
– without terminal propagation: t.d.=313; f.t.=591
– (t.d. reduced to 235 by iterative interchanges)
– with terminal propagation: t.d.=186; f.t.=182
– (t.d. reduced to 152 by iterative interchanges)
– Iterative Interchange Refinement is helpful
• The program is in production use as part of an automatic
placement system in AT&T Bell Lab.
– Solutions within 10% of the best hand layout
62
Mincut Placement
Perform quadrature mincut onto 4 × 4 grid
Start with vertical cut first
undirected graph model w/ k-clique weighting

thin edges = weight 0.5, thick edges = weight 1
Practical Problems in VLSI Physical Design Mincut Placement (1/12)
Recursive Bisection
Start with vertical cut
Perform terminal propagation with middle third window

63
Cut 3: Terminal Propagation

Two terminals are propagated and are “pulling” nodes
Node k and o connect to n and j: p1 propagated (outside window)
Node g connect to j, f and b: p2 propagated (outside window)
Terminal p1 pulls k/o/g to top partition, and p2 pulls g to bottom
Cut 4: Terminal Propagation

One terminal propagated
Node n and j connect to o/k/g: p1 propagated
Node i and j connect to e/f/a: no propagation (inside window)
Terminal p1 pulls n and j to right partition

64
Cut 8 to 15
16 partitions generated by 15 cuts
HPBB wirelength = 23
Quadratic Programming (QP)
• Definition
– Process of solving optimization problems involving quadratic functions
– One seeks to optimize (minimize or maximize) a multivariate quadratic
function subject to linear constraints on the variables
• QP with n variables and m constraints
– n-dimensional vector c
– n × n-dimensional real symmetric matrix Q
– m × n-dimensional real matrix A
– m-dimensional real vector b
65
Analytical Placement
• Gordian package:
– GORDIAN: Gordian: VLSI Placement by Quadratic
Programming and slicing Optimization: J. M. Kleinhans, G.Sigl,
F.M. Johannes, K.J. Antreich, IEEE TCAD, 1991
– GORDIAN-L: Analytical Placement: A Linear or a Quadratic
Objective Function?: G. Sigl, K. Doll, F.M. Johannes, DAC91
• Gordian: A Quadratic Placement Approach
– Global optimization: solves a sequence of quadratic programming
problems
– Partitioning: enforces the non-overlap constraints
Quadratic Placement
• Cells are spread out to remove overlaps

– IO cells pull the cells
66
i=0 i=29
i=58 i=87
Adaptec1 Stats
• Circuit stats
– # cells/nets/pins 210,863/219,687/19,205
– chip size 6000um × 6000um
– bin size 50um × 50um
– # placement bins 120 × 120
– Average bin occupancy 210K/1202 =14.6 gates/bin
• Wirelength result (HPBB)
– iteration 0 34,069,060
– iteration 29 46,352,680
– iteration 58 80,783,336
– iteration 87 98,111,904
67
Overview of Gordian Package

Procedure Gordian
l:=1;
global-optimize(l);
while (there exists |Ml|>k)
for each r ɽ R(l)
partition(r, r’, r”);
l++;
setup-constraints(l);
global-optimize(l);
repartition(l);
final-placement(l);
endprocedure
Problem Definition
connection to
y other modules
module u
lvu net node v
pin vu (xuv, yuv)
(xu, yu)
(avu, bvu) = offset from center of u
(xv, yv)
x
Squared wire length of net v
Lv ¦ [( x
uM v
uv xv ) 2 ( yuv yv ) 2 ]
x uv x u avu , yuv yu bvu

68
Cost Function
• Minimize the following:
1
I ¦ Lv wv
2 vN
I ( x, y) X T CX d Tx X Y T CY d Ty Y
I ( x) X T CX d T X
Constraints
• The center of gravity constraints
– At level l, chip is divided into q (2l ) regions
– For region p, the center coordinates: (up, vp)
– Mp: set of modules in region p
– Matrix from for all regions
෍ ‫ܨ‬௠ ȉ ‫ݔ‬௠ = ‫ݑ‬௣ × ෍ ‫ܨ‬௠

௠‫א‬ெ೛ ௠‫א‬ெ೛
– Lastly, we have
௟ ௟
‫ܨ‬௠ / ෍ ‫ܨ‬௠ , if ݉ ‫ܯ א‬௣
‫ ݑ = ܺ ܣ‬, where ܽ௣௠ = ൞ ௠‫א‬ெ೛
0 otherwise
69
Problem Formulation
(uȡ’, vȡ’)
D A B C D E F G
E
ª º
F U ««* * * 0 0 0 »
»
B A( l )
A U ' «0 0 0 * * * »
« »
C ¬ ¼
(uȡ, vȡ)
Linearly constrained Quadratic Programmin g problem

LQP : minm {)( x ) X T CX d T X such that A l X ul }
x R
Hessian Matrix
• Second order partial derivatives of f

– Determine the concavity of the graph of f
– Useful to find local optimal solutions
– Our WL function is quadratic
• Hessian will have constants only
– Laplacian is Hessian!
Hessian matrix
concavity Laplacian
70
3 Types of Quadratic Programming
• Our Gordian QP
• 3 Types of QP: Depends on C

– Positive Definite Hessian Matrix (Bowl)
• All its eigenvalues are positive
• One optimal value: Convex
– Semi-definite Hessian Matrix (Trough)
• All its eigenvalues are non-negative
• Line of optimal values: Convex
– Indefinite Hessian Matrix (Saddle)
• Optimal is on the boundaries: Non-Convex
• NP Hard
Gordian Laplacian
• Our Laplacian C
– C is positive definite if C’s eigenvalues are nonnegative
– C is positive definite if xTCx is positive
– C is positive definite if C is diagonal and the entries are positive
– So, C is positive definite
• So, Gordian QP:

71
Partitioning
• Recursive partitioning is needed
– to resolve module overlap in global placement
– global placement problem will be solved again with two
additional center_of_gravity constraints
Cp(a)
M p o ( M p' , M p'' )
40
x u' d x u'' u' M p' and u' ' M p'' 30
D ¦F / ¦F
u ' M p '
u
u M p
u | 0.5 20
10
cut value : C p (D ) ¦w
v N C
v 0
0.0 0.25 0.5 0.75 1.0
Repartitioning
• Module exchange after each cut to improve cut size
– terminal propagation using global placement positions
• Repartitioning
– to ‘undo’ the mistake made at the previous level:
Procedure repartition(l)
if overlap exists
for each rR(l-1)
merge-regions(r, r’, r’’);
partition(r, r’, r’’);
setup-constraints(l);
global-optimize(l);
endif
72
Summary of Gordian
module coordinates
Global Partitioning of
Optimization module set and
minimization of dissection of
wire length placement region
position constraints
Regions
module with d k
coordinates modules
Final
Placement
adoption of style
dependent
constraints
Complexity: space = O(m), time = O(m1.5 log2m)

Final placement: standard cell, macro-cell & SOG
Experimental Results
Comparison of Results for Standard Cell Blocks
Area After Routing/mm2

Circuit GORDIAN Min-Cut Annealing
scb1 2.7 3.1 2.6
scb2 5.8 5.3 5.0
scb3 15.7 25.6 9.1
scb4 14.0 16.9 13.2
scb5 10.6 11.3 10.9
scb6 11.3 12.7 12.8
scb7 16.4 20.2 19.8
scb8 51.7 89.2 59.5
scb9 54.0 98.6 80.0
CPU-time scb8 120s 366s 39851s
CPU-time scb9 135s 440s 34709s
ratio 1 :3 :300
73
GORDIAN Placement
Perform GORDIAN placement
Uniform area and net weight, area balance factor = 0.5
Undirected graph model: each edge in k-clique gets weight 2/k
Practical Problems in VLSI Physical Design GORDIAN Placement (1/21)
IO Placement
Necessary for GORDIAN to work

74
Adjacency Matrix
Shows connections among movable nodes
Among nodes a to j
Pin Connection Matrix

Shows connections between movable nodes and IO
Rows = movable nodes, columns = IO (fixed)

75
Degree Matrix
Based on both adjacency and pin connection matrices
Sum of entries in the same row (= node degree)
Laplacian Matrix
Degree matrix minus adjacency matrix

76
Fixed Pin Vectors

Based on pin connection matrix and IO location
Y-direction is defined similarly
Fixed Pin Vectors (cont)

77
Fixed Pin Vectors (cont)
Level 0 QP Formulation
No constraint necessary

78
Level 0 Placement
Cells with real dimension will overlap
Level 1 Partitioning
Perform level 1 partitioning
Obtain center locations for center-of-gravity constraints

79
Level 1 Constraint
Level 1 LQP Formulation

80
Level 1 Placement
Verification
Verify that the constraints are satisfied in the left partition

81
Level 2 Partitioning
Add two more cut-lines
This results in p1={c,d}, p2={a,b,e}, p3={g,j}, p4={f,h,i}
FKLSKHLJKWLV
ZHVSOLWFHOOVLQWRUDWLR
Level 2 Constraint

82
Level 2 LQP Formulation
Level 2 Placement
Clique-based wiring is shown

83
Summary
Center-of-gravity constraint
Helps spread the cells evenly while monitoring wirelength
Removes overlaps among the cells (with real dimension)
Steiner Routing
ECE6133
Prof. Sung Kyu Lim

84
ARM A53 Placement

1/11
TSMC 28nm BEOL Spec

2/11
Width Pitch R C
Dir.
(um) (um) (ohm/um) (fF/um)
M1 0.05 0.135 V M1 7.24 0.172
M2 0.05 0.100 H M2 9.05 0.175
M3 0.05 0.100 V M3 9.06 0.181
M4 0.05 0.100 H M4 9.05 0.177
M5 0.05 0.100 V M5 9.06 0.180
M6 0.05 0.100 H M6 9.05 0.177
85
Full-Chip Routing
3/11
M1 M2 M3
Full-Chip Routing
4/11
M4 M5 M6
86
M1 Layer (Mostly Intra-Cell Routing)

5/11
yellow: signal
M2 Layer
6/11
yellow: signal
magenta: clock, red: power/ground
87
M3 Layer
7/11
yellow: signal
magenta: clock
M4
8/11
yellow: signal
magenta: clock
88
M5
9/11
yellow: signal
magenta: clock, red: power/ground
M6
10/11
yellow: signal
cyan: power/ground
89
M7 and M8
11/11
magenta: power/ground
Routing
placement
Generates a "loose" route for each net.

Assigns a list of routing regions to each net without
specifying the actual layout of wires.
global routing
Global routing
detailed routing
Finds the actual geometric layout of each net within

the assigned routing regions.
compaction Detailed routing

90
Routing Constraints
• 100% routing completion + area minimization, under a set of constraints:
– Placement constraint: usually based on fixed placement
– Number of routing layers
– Geometrical constraints: must satisfy design rules
– Timing constraints (performance-driven routing): must satisfy delay
constraints
– Crosstalk?
– Process variations?
w
s
Two−layer routing Geometrical constraint
Graph Models for Global Routing: Grid Graph
• Each cell is represented by a vertex.

• Two vertices are joined by an edge if the corresponding cells are adjacent
to each other.
• The occupied cells are represented as filled circles, whereas the others
are as clear circles.
d
a b d
a b
c
c
91
Global-Routing Problem
• Given a netlist N={N1 , N2 , . . . , Nn}, a routing graph G = (V, E), find a
n tree Ti for each net Ni, 1 ≤ i ≤ n, such that U (ej ) ≤ c(ej ), ∀ej ∈ E
Steiner
and i=1 L(Ti ) is minimized,
where
– c(ej ): capacity of edge ej ;
– xij = 1 if ej is in Ti; xij = 0 otherwise;
n
– U (ej ) = i=1 xij : # of wires that pass through the channel corre-
sponding to edge ej ;
– L(Ti): total wirelength of Steiner tree Ti.
• For high-performance, the maximum wirelength (maxni=1 L(Ti)) is mini-
mized (or the longest path between two points in Ti is minimized).
Classification of Global-Routing Algorithm

• Sequential approach: Assigns priority to nets; routes one net at a time
based on its priority (net ordering?).
• Concurrent approach: All nets are considered at the same time (com-
plexity?)
global−routing algorithm
sequential approach concurrent approach
two−terminal multi−terminal hierarchical integer programming
line−search maze Steiner−tree based
Lee Hadlock Soukup

92
Data Structures and Basic Algorithms
Spanning Tree
Problem Formulation:
Given a graph = , select a subset G V E V
0
V ,
~
such that has property P .
V
0
~ Minimum Spanning Tree

Problem Formulation:
Given an edge-weighted graph = , select a subset G V E
of edges E such that induces a tree and the

0
E E
0
total cost of edges Pe E i, is minimum over i2

0 wt e
~
all such trees, where i is the cost or weight of wt e
the edge i. e
Used in routing applications.

c Sherwani 92
Data Structures and Basic Algorithms
Steiner Trees
1. Problem formulation:
Given an edge weighted graph = and a subset G V E D V ,
select a subset , such that
V and
0
V D V
0
V
0
induces a tree of minimum cost over all such trees.

~
The set is referred to as the set of demand points and

D
the set is referred to as Steiner points.

V
0
D
Used in the global routing of multi-terminal nets.

A 7 B
4 5 C
7 D 6
5
6 8 E
6
12
J 2 3
9 F
6 H 5 5
5
I 6 G
Demand Point
(a) (b)

c Sherwani 92
93
Min Spanning Trees vs. Steiner Trees
• Both problems try to “span” nodes in the given graph

– Goal is to minimize the total edge weight
– MST: span all nodes
– Steiner tree: span only a designated subset of nodes. We can use “extra”
nodes (= steiner nodes) if they help.
+DQDQ
V7KP

7KHUHH[LVWVDQ
RSWLPDO567ZLWKDOO
6WHLQHUSRLQWVVHW
6FKRVHQIURPWKH
LQWHUVHFWLRQSRLQWV
RIKRUL]RQWDODQG
YHUWLFDOOLQHVGUDZQ
IURPSRLQWVRI'
94
(a) (b)
(c) (d)
+ZDQJ
V7KP

7KHUDWLRRIWKHFRVW
RIDUHFWLOLQHDU067
WRWKDWRIDQRSWLPDO
567LVQRJUHDWHU
WKDQ
(e)
Steiner Routing: 3D vs. 2D

routing problem instance
3D Steiner Routing 2D Steiner Routing + Layer Assignment

95
The 1-Steiner Problem

Definition
Routing Practical Problems in VLSI Physical CAD 1-Steiner Algorithms
Why 1-Steiner Insertion?

Can Reduce Wirelength

96
1-Steiner by Kahng/Robins
Iterative 1-Steiner Insertion Algorithm
Keep adding 1-Steiner point one-by-one until no more gain
Naïve implementation: O(n2 × n log n × n)

Sophisticated implementation: O(n3)
1-Steiner Routing by Kahng/Robins

Perform 1-Steiner Routing by Kahng/Robins
Need an initial MST: wirelength is 20
16 locations for Steiner points
Practical Problems in VLSI Physical Design 1-Steiner Algorithm (1/17)

97
First 1-Steiner Point Insertion

There are six 1-Steiner points
Two best solutions: we choose (c) randomly
before
insertion
First 1-Steiner Point Insertion (cont)
before
insertion

98
Second 1-Steiner Point Insertion

Need to break tie again
Note that (a) and (b) do not contain any more 1-Steiner point: so
we choose (c)
before
insertion
Third 1-Steiner Point Insertion

Tree completed: all edges are rectilinearized
Overall wirelength reduction = 20 í 16 = 4
before
insertion

99
Sample Kahng/Robins Routing (1/3)
• 5 points in 10x10 grid

– 2 Steiner points used
MST (WL = ) final tree (WL = 1)

– 20 Steiner points used
MST (WL = 183) final tree (WL = 163)

100

– 22 Steiner points used, it took 15ms to route
Kahng/Robins Speedup Techniques
• Random variant
– Instead of choosing the best gain Steiner point in each iteration, just pick
the first one found.
– Time spent on each step is less, but more Steiner points need to be added.
• Prune out bad candidates

– After the first iteration, the Hanan grid points that gave no gain were
removed.
– This improved practical time complexity.
• Any other thoughts?

101
1-Steiner by Borah/Owens/Irwin
Interesting Observation
Gain Computation
Things to do
Thus, the gain is

102
Overall Algorithm
Multi-pass Heuristic
Entire algorithm can be repeated
1-Steiner Routing by Borah/Owens/Irwin

Perform a single pass of Borah/Owens/Irwin
Initial MST has 5 edges with wirelength of 20
Need to compute the max-gain (node, edge) pair for each edge in
this MST

103
Best Pair for (a,c)
Best Pair for (b,c)

Three nodes can pair up with (b,c)
l(a,c) í l(p,a) = 4 í 2

104
Best Pair for (b,c) (cont)

All three pairs have the same gain
Break ties randomly
l(b,d) í l(p,d) = 5 í 4
l(c,e) í l(p,e) = 4 í 3
Best Pair for (b,d)

Two nodes can pair up with (b,d)
both pairs have the same gain
l(b,c) í l(p,c) = 4 í 3
l(b,c) í l(p,e) = 4 í 3

105
Best Pair for (c,e)

Three nodes can pair up with (c,e)
l(b,c) í l(p,b) = 4 í 3
l(b,d) í l(p,d) = 5 í 4
Best Pair for (c,e) (cont)
l(e,f) í l(p,f) = 3 í 2

106
Best Pair for (e,f)

Can merge with c only
l(c,e) í l(p,c) = 4 í 3
Summary
Max-gain pair table
Sort based on gain value

107
First 1-Steiner Point Insertion

Choose {b, (a,c)} (max-gain pair)
Mark e1 = (a,c), e2 = (b,c)
Skip {a, (b,c)}, {c, (b,d)}, {b, (c,e)} since their e1/e2 are already
marked
Wirelength reduces from 20 to 18
Second 1-Steiner Point Insertion

Choose {c, (e,f)} (last one remaining)
Wirelength reduces from 18 to 17

108
Sample Borah Routing

– 22 Steiner points used, it took 59ms to route
Comparison
Kahng/Robins vs Borah/Owens/Irwin
Kahng/Robins tends to give better results
Borah/Owens/Irwin runs much faster: O(n4 log n) vs O(n2)

109
Bounded Radius Routing

Why Radius?
Longest source-sink path length among all sinks
Smaller path resistance: better performance
Both Radius and Cost?
Cost = wirelength
Radius (= R) and wirelength (= C) are both important for RC-
delay reduction
Bounded PRIM vs Bounded Radius/Cost
J. Cong, A. B. Kahng, G. Robins, M. Sarrafzadeh, and C. K.
Wong, "Provably good performance-driven global routing",
TCAD, 1992.
Routing Practical Problems in VLSI Physical CAD BRBC Algorithm
Radius vs Wirelength

110
BPRIM Under İ =
Radius bound =
= regular PRIM
Practical Problems in VLSI Physical Design Bounded Radius Routing (9/16)
BPRIM Under İ = (cont)

111
Bounded PRIM Algorithm

Variation of PRIM’s MST algorithm
Why Tighter Radius?
• BPRIM uses tighter radius bound during backtracing

– R instead of (1+e)R
112
Bounded PRIM Algorithm

Comparison (e = 0, 0.5, infinity)
Radius bound/value increase
Wirelength decreases
Bounded Radius Routing

Perform bounded PRIM algorithm
Under İ = 0, İ = 0.5, and İ =
Compare radius and wirelength
Radius = 12 for this net

113
BPRIM Under İ = 0 (cont)
BPRIM Under İ = 0 (cont)

114
BPRIM Under İ = 0.5 (cont)
BPRIM Under İ = 0.5 (cont)

115
Comparison
As the bound increases (12 ĺ 18 ĺ )
Radius value increases (12 ĺ17 ĺ 22)
Wirelength decreases (56 ĺ 49 ĺ 36)
Multi-net Routing
ECE6133
Prof. Sung Kyu Lim

116
Global Routing
• Global routing is planning

– Divide the routing into tiles
– Build Steiner tree for each net
– Routing is done in terms of tiles
Routing Problem Global Routing Tiles Global Routing Result
Cross Point Assignment
• Key step before detailed routing

– CPA decides pin locations along tile boundaries
– Key objective is routing completion, via usage, and wirelength
Global Routing Result Routing Problem Cross Point Assignment

(repeat) (repeat)
117
Detailed Routing
• Detailed routing decides exact topology

– We use CPA results
– We use the actual routing tracks and vias in each tile
Routing Problem Cross Point Assignment Detailed Routing

(repeat)
Type 1: Switchbox Routing
• Key problem for detailed routing

– CPA gives pin locations on all 4 sides
we assume
two metal
layers
(H and V)
in this case
118
Type 2: Channel Routing
• Key problem for detailed routing

– CPA gives pin locations on 2 sides
two metal layers (H and V) again
Routing Models
• Grid-based model:
– A grid is super-imposed on the routing region.
– Wires follow paths along the grid lines.
• Gridless model:
– Any model that does not follow this “gridded” approach.
grid−based gridless
119
Models for Multi-Layer Routing

• Unreserved layer model: Any net segment is allowed to be placed in
any layer.
• Reserved layer model: Certain type of segments are restricted to par-
ticular layer(s).
– Two-layer: HV (horizontal-Vertical), VH
– Three-layer: HVH, VHV
track 2 track 3
track 1 track 2
track 1
track 1
unreserved layer model
HVH model VHV model
3 types of 3−layer models
Terminology for Channel Routing Problems

terminals
0 1 4 5 1 6 7 0 4 9 10
upper boundary
0 1 4 5 1 6 7 0 4 9 10
netlist:
23535268987
lower boundary
2 3 5 3 5 2 6 8 9 8 7
local 1 3 5 5 4 3 3 3 4 3 2
density
terminals
upper boundary
vias
dogleg branches
lower boundary
trunks
• Local density at column i: total # of nets that crosses column i.

• Channel density: maximum local density; # of horizontal tracks required ≥ channel
density.
120
Channel Routing Problem

• Assignments of horizontal segments of nets to tracks.
• Assignments of vertical segments to connect.
– horizontal segments of the same net in different tracks, and
– the terminals of the net to horizontal segments of the net.
• Horizontal and vertical constraints must not be violated.
– Horizontal constraints between two nets: The horizontal span of two
nets overlaps each other.
– Vertical constraints between two nets: There exists a column such
that the terminal on top of the column belongs to one net and the
terminal on bottom of the column belongs to the other net.
• Objective: Channel height is minimized (i.e., channel area is mini-
mized).
Horizontal Constraint Graph (HCG)

• HCG G = (V, E) is undirected graph where
– V = {vi|vi represents a net ni}
– E = {(vi, vj )| a horizontal constraint exists between ni and nj }.
• For graph G: vertices ⇔ nets; edge (i, j) ⇔ net i overlaps net j.

1 5 2 2 1 0 0 5 1
0 1 3 4
4
0 1 2 5 3 4 0 0 2 3 2
3
A routing problem and its HCG. 3

121
Vertical Constraint Graph (VCG)

• VCG G = (V, E) is directed graph where
– V = {vi|vi represents a net ni}
– E = {(vi, vj )| a vertical constraint exists between ni and nj }.
• For graph G: vertices ⇔ nets; edge i → j ⇔ net i must be above net j.
5 1
1 5 2 0 2 1 1 0 3 4 0
4
0 1 2 5 3 4 0 0 2 3 2
3
A routing problem and its VCG. 3
2-L Channel Routing: Basic Left-Edge

Algorithm
• Hashimoto & Stevens, “Wire routing by optimizing channel assignment
within large apertures,” DAC-71.
• No vertical constraint.
• HV-layer model is used.
• Doglegs are not allowed.
• Treat each net as an interval.
• Intervals are sorted according to their left-end x-coordinates.
• Intervals (nets) are routed one-by-one according to the order.
• For a net, tracks are scanned from top to bottom, and the first track
that can accommodate the net is assigned to the net.
• Optimality: produces a routing solution with the minimum # of tracks
(if no vertical constraint).
122
Basic Left-Edge Algorithm
Algorithm: Basic Left-Edge(U, track[j])

U : set of unassigned intervals (nets) I1 , . . . , In;
Ij = [sj , ej ]: interval j with left-end x-coordinate sj and right-end ej ;
track[j]: track to which net j is assigned.
1 begin
2 U ← {I1 , I2 , . . . , In};
3 t ← 0;
4 while (U = ∅) do
5 t ← t + 1;
6 watermark ← 0;
7 while (there is an Ij ∈ U s.t. sj > watermark) do
8 Pick the interval Ij ∈ U with sj > watermark,
nearest watermark;
9 track[j] ← t;
10 watermark ← ej ;
11 U ← U − {Ij };
12 end
Basic Left-Edge Example

• U = {I1 , I2 , . . . , I6 }; I1 = [1, 3], I2 = [2, 6], I3 = [4, 8], I4 = [5, 10], I5 = [7, 11], I6 =
[9, 12].
• t = 1:
– Route I1 : watermark = 3;
• t = 2:
• t = 3: Route I4
column: 1 2 3 4 5 6 7 8 9 10 11 12
1 0 0 0 4 2 0 3 0 4 0 6
0 2 1 3 0 0 5 0 6 0 5 0
density: 1 2 2 2 3 3 3 3 3 3 2 1
123
Constrained Left-Edge Algorithm
Algorithm: Constrained Left-Edge(U, track[j])

U : set of unassigned intervals (nets) I1 , . . . , In;
Ij = [sj , ej ]: interval j with left-end x-coordinate sj and right-end ej ;
track[j]: track to which net j is assigned.
1 begin
2 U ← {I1 , I2 , . . . , In};
3 t ← 0;
4 while (U = ∅) do
5 t ← t + 1;
6 watermark ← 0;
7 while (there is an unconstrained Ij ∈ U s.t. sj > watermark) do
8 Pick the interval Ij ∈ U that is unconstrained,
with sj > watermark, nearest watermark;
9 track[j] ← t;
10 watermark ← ej ;
11 U ← U − {Ij };
12 end
Constrained Left-Edge Example

• I1 = [1, 3], I2 = [1, 5], I3 = [6, 8], I4 = [10, 11], I5 = [2, 6], I6 = [7, 9].
• Track 1: Route I1 (cannot route I3 ); Route I6 ; Route I4 .
• Track 2: Route I2 ; cannot route I3 .
• Track 3: Route I5 .
• Track 4: Route I3 .
1 1 1 2 2 5 6 3 0 4 0
2 5 0 5 5 3 3 0 6 0 4
1 4
2 2
5 6 5 5
3 3 3 3
track 1 track 2 track 3 track 4
124
Doglegs in Channel Routing

HDoglegs may reduce the longest path in VCG
a a b c a b c
c-2
c-1
a b c d d a b c d d
a
a c-2
b
b d
c
c-1
d
HDoglegs break cycles in VCG
b a b a b-1
a b-1
?
b a
b-2
a b a b b-2
Doglegs in Channel Routing(Cont’d)

HRestricted Dogleg vs unrestricted dogleg
a a a
a a
125
Detailed Routing
Dogleg Router
Drawback of LEA: the entire net is on a single track.
~ Doglegs are used to place parts of a net on dierent
tracks, thereby minimizing channel height.
1 1 2 3 2
2 (a) 3
1 1 2 3 2
2 (b) 3
~ Using a dogleg to reduce channel height

c Sherwani 92
Detailed Routing
Dogleg Router
Each Multi-terminal net is broken into a set of two-terminal nets.
Two parameters are used to control routing:
1. range: Determine the number of consecutive two-terminal
~ subnets of the same net that can be placed on the same track.
2. routing sequence: Speci es the starting position and the direction
of routing along the channel.
Modi ed LEA is applied to each subnet.
0 1 2 2 4 3 0 0
0 1 2 2 4 3 0 0
1 2 0 3 3 0 4 4
1 2 0 3 3 0 4 4
(a) (b)
~ Example of Dogleg Router
~
Algorithms for VLSI
Deutsch ICCADDesign
Physical 1985 Automation 7.32 j
c Sherwani 92
126
Dogleg Router: Example

• Decompose multi-terminal nets into two-terminal nets
Final solution
*

1

1

1 1 19
ė 1

1 1
1 1

ė 1 1 1!""1
1
ė "# 1$$1!1
1$
1"
1% !
1"$$1
1 &
ė '
1
1"((1 1
11"&") 1
1$
127
31
1,
1$ "
+
,
1--1
111111
ė 1 1 1
1
1
1 1.((1/001
11 1
1 1
1
1!" "1 1 !1
ė ,13$331 1% 13

331 1% 13!$331
ė # #1 1 11 1 1!!1
1
31
1,
1$ "
"19
4
,
1+--111111

1
1$
1 ((1

1
1
1
1

ė #
#14--167 1 % 13#
#1
1" "1 1

16"1
6
ė #
#14+--1
18##11613711,% 1
1#$ $1 #1
9
ė #
#144--1
1:$13
1
13 1$ 1!!1 #19
ė #
#14;--1
1:$1
1
1$
$19 1

1<1 1$ $19
ė #
#14=--1/
1 13167 1, 1'! % 13# #1>1
"
"1
ė #
#14?--1@&
13
19 &1"
128
41$ "
"1,"
;
!
ė 1
11
1 1

ė 1 1117 71
1

11 1 1 1 11

1
% " 117%
41$ "
"1,"
19
=
"

ė 3
1 111"7
1
13A$331 1

1"""*A$* $
ė 1 $ 1 $
$1

1
1"% 1
!!1
1

1
1#
1"
1
#% 1
ė 11 1 $

$1

1
1

1
1
"% 1!!17
1
ė 3
1% 1 --11B;((11
11
111
1
% 1 11
1%% CC1
129
41$ "
"1,"
19
?

ė 1 1
1$
$1!!1
1 &1# 1! 1 1
1" 111% 1 1 1##1!!1

1 19 1
1" 17((1
1

1 1 1
1# 11
1%" "1!!1

1 1% ! 1" 1771D
D1

* *CC1
ė $
$1
1
1
!
1"
ė
1
1
1
1 "$$1
ė 3#1 --1E
ė 1$ 1
1
1 1"% 1
!!1"
1
1"*#
1 1 $
1
7
#
#14
19B?
F
67
1 %
13#
#1

1"
"1
1
16"16
'!13 111) (13%$1

31 1 1!1#% 1 1
1#% 1711 1
1 1 "#11
1
$
11 1 (1%1
$11 11 1!"1
1
$ 1!1 1 11
1

17(1

$1
1711G9311G9C
Y(n): tracks occupied by net n

130
#
#14
19+B?

67
1 %
13#
#1

1"
"1
1
16"16
9 1 11 1111

11 1 1711
G91!1 111 1
"#171ė $11
"#1$
111

17C
#$%1& # '( )*+ ,- ./

0 1234 50 6237 89:
#
#14
194B?

67
1 %
13#
#1

1"
"1
1
16"16
9 1 11 1111

11 1 1711
G91!1 111 1
"#171ė $11
"#1$
111

17C
131
#
#14
19;B?
E
67
1 %
13#
#1

1"
"1
1
16"16
1 11 111% 1

%$ 1113!31 1
1 1 #19 1 #14=C
#
#14
19=B?

67
1 %
13#
#1

1"
"1
1
16"16
'!13
11 1% 1) (1
113%$131% 1 C1
(1!13 ⴋ1
1 1
1 $" 11
!19 #(1 1A1

%$11 1 11 1
% 1%$ 111 1 1 1
1 (1
1 1 1
11% 1%$ 111 #1
4=C
132
#
#14
19?B?
+
67
1 %
13#
#1

1"
"1
1
16"16
1# 1 (1!1 1 1

1 "#17(1
1 13
ⴋ 1ⴋE111 11 1 1
11 1"1
(1 111 11 1
!"1#11%"1!1
"C
#
#14+
+19B;
4

18#
#1
16
137
1
1,%
1
1#$
$1 #19
1 1 $" 11 1

"11# 1#1 1
11# 1 111 1
1"1 "# 71!1
11 1 &1"C
133
#
#14+
+19+B;
;

18#
#1
16
137
1
1,%
1
1#$
$1 #19
1 1 $" 11 1

"11# 1#1 1
11# 1 111 1
1"1 "# 71!1
11 1 &1"C
#
#14+
+194B;
=

18#
#1
16
137
1
1,%
1
1#$
$1 #19
1# 111! 1#1 1

71!1 1A$11
(1#1 1
1
71!1 1 1
3! 3C13 1# 1
! 11 11!111
1#

$$1
1!11
11
111#1%1"
C
134
#
#14+
+19;B;
?

18#
#1
16
137
1
1,%
1
1#$
$1 #19
1 1# 11 1

$ 1"1!1A$1 $ C
#
#14
F

1:$
13
1

13
1$
1!!1 #19

1 1$ 1!171
$
11 1 1%1

$1 1A$1 1

1 1 !! 1!1"$
1 -191!"1 1
"&""1711G911
1 1#% 1 "#17(1

191!"1 1"""
711G911 1 $ 1
#% 1 "#17C
135
#
#1;

1:$
1
1
1$
$19
1

1<1 1$
$19
311
11 1A$11
" 1 1 111 "#1
711 111 11
#% 111$ 1
$ C
#
#1=

/

1 13
167
1,
1'! %
13#
#1>1"
"1

'!11 13 11

11% 1
%$ 111711 #1
4(1 11 1171!1
1 1
1%$1 1 11
1 17C1, 1 171
1 1 1 1!1 1
11#% 1% 1 1
&$17C
136
#
#1?
+E
@&

13
19 &1"
1 1 11 1
'G91H11
11 11#1
! 1"1(1"7 1G91% 1
1 "#1 C193 1$1
!1 1 1111
!
C13 1!1 171
11 111G91!1" 1(1
&
1 13
$$1
31!1
11$17111
"1D11 1### 1
)11$C

+

PDA 수업자료

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PDA 수업자료

Uploaded by

Copyright:

Available Formats

1

v᮹ ‫ݡ‬ᔢ ⦺ᇡ ‫⦺ݡ‬ᬱ ᯝၹᯙ

Sung Kyu Lim, Georgia Tech (2022)

ॵḡ▙ ⫭ಽ ᖅĥ᮹ ⦥ᙹ ŝᱶᯙ QIZTJDBMEFTJHOᨱ

Sung Kyu Lim, Georgia Tech (2022)

ॵḡ▙ ⫭ಽ ᖅĥᨱᕽ QIZTJDBMEFTJHO᯲ᨦᮡ ⧊ᖒᯕ

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

ˀ ᳑ḡᦥ Ŗ‫ ݡ‬ᱥᯱ ⍕⥉░ Ŗ⦺ŝ

Sung Kyu Lim, Georgia Tech (2022)

under the hood

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

Sung Kyu Lim, Georgia Tech (2022)

Prof. Sung Kyu Lim

VLSI Design Flow

Circuit Design Placement

Packaging and Testing

Matrix Solver (20K)

Matrix Solver (20K)

Matrix Solver (20K)

Matrix Solver (20K)

MAC Unit (267K)

MAC Unit (267K)

MAC Unit (267K)

32-bit Processor (2.7M)

32-bit Processor (2.7M)

32-bit Processor (2.7M)

20K 267K 2.7M

20K 267K 2.7M

Prof. Sung Kyu Lim

Decomposition of a complex system into smaller subsystems.

Each subsystem can be designed independently speeding up

Decomposition scheme has to minimize the interconnections

Decomposition is carried out hierarchically until each

Module 1 Module 2 Module 3 Module n Interface

Algorithms for VLSI Physical Design Automation 4.1 j

Algorithms for VLSI Physical Design Automation 4.2 j

Partitioning at dierent levels

System Board Chip

Chip Board Board Board

Chip Chip Chip

Algorithms for VLSI Physical Design Automation 4.3 j

Algorithms for VLSI Physical Design Automation 4.10 j

Algorithms for VLSI Physical Design Automation 4.11 j

Practical Problems in VLSI Physical Design KL Partitioning (1/6)

Practical Problems in VLSI Physical Design KL Partitioning (2/6)

Practical Problems in VLSI Physical Design KL Partitioning (3/6)

Practical Problems in VLSI Physical Design KL Partitioning (4/6)

Practical Problems in VLSI Physical Design KL Partitioning (5/6)

Practical Problems in VLSI Physical Design KL Partitioning (6/6)

Drawbacks of K-L Algorithm

K-L algorithm considers balanced partitions only.

Algorithms for VLSI Physical Design Automation 4.13 j

This algorithm is a modied version of Kernighan-Lin Algorithm.

Algorithms for VLSI Physical Design Automation 4.14 j

Data Structure Used in Fiduccia-Mattheyses Algorithm

Algorithms for VLSI Physical Design Automation 4.15 j

Practical Problems in VLSI Physical Design FM Partitioning (1/12)

Practical Problems in VLSI Physical Design FM Partitioning (2/12)

Gain Computation and Bucket Set Up