Lecture9 PhysicalSynthesis Placement

Lecture 9: Physical Synthesis -
Placement
1
Digital Circuit Design Automation Flow (I)
q Logic-level design
functional logic circuit extract &
simulation simulation analysis verification
Functional /
System RTL & Logic Circuit
Architecture
Specification Synthesis Design
Design
module fz(a,b,c,d,e,Z);
input a,b,c,d;
output Z;
assign Z =
~(a|(b&C)|(d&e));
endmodule
behavior HDL description gate-level

representation representation
switch-level
representation
2
Digital Circuit Design Automation Flow (II)
q Layout-level design (a master piece of work …)
extract &
verification
GDSII
files
Physical
Fabrication Packaging
Synthesis
3
Physical Design Details
System Spec. Partition
Graph
Architecture
Module(a, b) Floorplan
Input a;
Output b; Graph
Function, logic
Placement
Circuit design Analytical
CTS
Physical design Tree
Signoff Routing
DRC, LVS Graph
Manufacturing Timing
Disk, legacy C codes
Testing (Linux LSF cluster)
22nm 10B+
Final chip NFS transistors
4
About Layout Design …
q What you know...
q Computational Boolean algebra, representation, some
verification, some synthesis
q This is what happens in the “front end” of the ASIC
design process
Synthesis, Connected gates + wires
High-level design
verification in your technology:
description
called a netlist
q What you don’t know...

q The “back end” problem -- turning these into IC masks
to build real chips
q Called “layout” or “physical design”
q Where we start: the placement problem
5
Placement for ASICs
q Focusing on the most common tasks in layout
q Row-based standard cell layouts for ASICs and SOCs
Synthesis & mapping give a

netlist of gates + wires
Our job is to place them

optimally in rows in
regions of the chip,
and to route the wires
to connect everything
6
Start with a Standard Cell Library
q A library of things called “Standard cells”
q A standard cell is a basic logic gate or small logic element,
e.g., a flip flop (FF)
q This is the set of allowed logic elements to build your chip.
Tiny example:
NOT NAND2, 3 NOR2, 3 1bit ADDER D FF
Q Q’
+ D
q Why?
q Lots of complicated electrical stuff going on at the transistor
level
q Each cell hides these electrical details, presents a simple,
geometric abstraction
7
How Big is the Library?
q Often, very big
q For all logic functions, input/output variants, timing variants,
electrical drive strengths…
Logic Drive
functions Fanin & Timing,
flip flop,
strength
~1000
X fanout
variants X scan X (1X, 2X
4X, 8X) = cells
variants variants
q How to think about a standard cell

q Simple abstraction of a geometric “container” for the circuits
you need to make logic.
q Inside the cell: Complex device & mask & electrical issues
q Outside the cell: A box with pins
8
Some Real Examples
q In older technology (130nm CMOS)
q Geometric fact #1: All have same height (so we can
arrange them in rows)
q Geometric fact #2: Cells can have different widths,
depending on circuit complexity
NAND2 Edge-triggered D Flip Flop
From: Graham Petley’s www.vlsitechnology.org site, which has open source cells
http://www.vlsitechnology.org/html/cells/wsclib013/lib_gif_index.html
9
Real Example: Place one Block on SoC
q Big SOCs often designed hierarchically, composed
from blocks which represent parts of overall system
Cells
Our basic task:

Place gates in rows
Route (connect) all gates with wires
10
Size Distribution of Standard Cells
q In real designs, small cells dominate, but lots of wide
cells too
q Old example: 200K gate IBM ASIC
• [J. Vygen, “Algorithms for Large-Scale Flat Placement, Proc.
ACM/IEEE Design Automation Conference, 1997].
q About size of small block on a big SOC (eg, 100M gates)
30000"
Number of cells
25000"
20000"
15000"
10000"
5000"
0"
1"
2"
3"
4"
5"
6"
7"
8"
9"
"
"
"
21 "
31 "
51 0"
10 00"
3"
"
"
"
"
"
"
"
10
18
19
20
0
11
12
13
14
15
16
17
,3
,5
12
,1
Width of cells (=“how many I/O pins wide”)

0,
11
Placement Problem
q What does a placer do?
q Input: Netlist of gates & nets Netlist…
q Output: Exact location of each gate
q Goal: Able to route (connect) all
Placement
wires
Cells
q Is this hard? Yes!

q Bad placement à Much more wire
q More wire: bigger, slower chip Routing
q If placement is very bad, next tool in
the flow—the router—unable to Cells
connect all wires, or meet timing
12
A Simple Placer Model
q Very simple model of the chip

q A simple grid (chessboard)
q Cells (gates) go in grid slots.
q Pins (input + output)
Any gate
can go in
q Very simple model of gates any slot
q All gates are exactly the same
size
q (Unrealistic, but simplifies things)
q Each grid slot can hold 1 gate
13
What does a Placer do?
q Placer optimizes the ability of router to connect all the nets
q But routers are computationally expensive tools. We can’t
run one “inside” placer
q We need a simplifying approximation
q What every real placer does: Minimizes expected wirelength

q For each wire in the design, estimate the expected length of
the routed wire
q To be more specific, we minimize this objective: ∑wires Wi
EstimatedLength(Wi)
q Placer “solves” for gate locations to minimize this objective
14
The Net
q Common term for a wire in a layout is: Net
q So, we call them “nets.” And the whole set of
gates+wires is “the netlist”
q Net categorized by how many things it connects. Call
them “points” for short
A “2-point net” A “4-point net”
y=5 y=5
4 4
3 3
2 2
1 1
0 0
x=0 1 2 3 4 x= 0 1 2 3 4
15
Wirelength Estimation
q Many different estimators, adapted to different
placer methods
q Why is this hard?
q Multi-point nets can be routed in many different paths.
q In a dense layout, nets do not all get routed in a
“shortest path.” Hard to predict.
y=5 y=5
4 4
3 3
2 2
1 1
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4 x= 0 1 2 3 4
16
Half-perimeter Wirelength (HPWL)
q Called Half-Perimeter Wirelength (HPWL), also
Bounding Box (BBOX)
q Put smallest “bounding” box around all gates.
q Assume gate lives in “center” of the grid slot.
q Add width (∆X) and height (∆Y) of the BBOX. That’s the
wirelength estimate.
A “2-point net” A “4-point net”
y=5 y=5
4 4
∆X=3-1=2; 3 3
∆X=4-1=3;
∆Y=4-1=3. 2 ∆Y=5-1=4. 2
HPWL=5 1 HPWL=7 1
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4 17
About the HPWL
q Easy to calculate, even for a multi-point net
q General formula:
q HPWL = [max{X coordinates of all gates) – min{X
coordinates of all gates}]
+ [max{Y coordinates of all gates) – min{Y coordinates
of all gates}]
q Always a lower bound on the real wire length

q No matter how complex the final routed wire path is…
q …you need at least this much wire to connect everything.
q Aside: all wiring on big chips (and most boards) is strictly
horizontal & vertical – no “arbitrary angles” for
manufacturing reasons. Another reason HPWL is good.
18
Distribution of HPWL
q Real distribution of HPWL for 200K gate IBM ASIC
[Vygen DAC97]
q 181K nets. Note LOG scale. Most nets are short. But
there is a long tail to distribution.
Number of nets
1000000
100000
10000
1000
100
10
1
HPWL
5
9. .0
10 0.0
15 5.0
0
0.
1.
1.
2.
2.
3.
3.
4.
4.
5.
6.
7.
8.
0.
9
1
-1
-2
0-
5-
0-
5-
0-
5-
0-
5-
0-
5-
0-
0-
0-
0-
0-
0.
0.
1.
1.
2.
2.
3.
3.
4.
4.
5.
6.
7.
8.
.0
.0
19
A Very Simple Placer
q 2 big ideas
q Start with a random placement
q Just randomly assign each gate to a grid location (only 1 gate per
grid, please!)
q Yes – this is a terrible placement. Really big L = ∑nets Ni HPWL(Ni)
q Random iterative improvement
q Pick a random pair of gates in the grid. Exchange their locations
(called a “swap”)
q Evaluate the change in L = ∑ HPWL(Ni); compute ΔL= [new
nets Ni
HPWL – old HPWL]

q If ΔL<0, new L got smaller: Good! Improvement! Keep this swap
q If ΔL>0, new L got bigger: Bad! Worse! Undo this swap
q Keep doing this for many, many random swaps, until L stops
improving
20
Random Iterative Improvement
//random initial placement
foreach( gate Gi in netlist )
place Gi in random location (x,y) in grid not already occupied
//calculate initial HPWL wirelength for whole netlist

L=0
foreach( net Ni in the netlist )
L = L + HPWL(Ni);
//main improvement loop

while ( overall HPWL wirelength L is improving ) {
pick random gate Gi; pick random gate Gj;
swap gates Gi and Gj
evaluate ΔL = new HPWL - old HPWL;
if ( ΔL < 0 ) { // improved placement! Update HPWL
L = L + ΔL
else // ΔL > 0, this is a worse placement
undo swap of Gi and Gj
}
21
Incrementally Update HPWL
q Incremental wirelength update calculation
q Cannot afford to re-compute length of each net in
entire placement after each swap!
q Most nets did not change! Do this incrementally--just
look at nets that could change
We only keep track of the min and max coordinates of x and y of the bounding box!
y=5 Net i y=5 Net i

4 4 Net k
1 3
3 Swap 2
2 Net j Net k Gates 1,2 2 Net j
1 1
2 1
0 0
x=0 1 2 3 4 x= 0 1 2 3 4
22
How Does This Work?
q It works ok….
q Small benchmark
q ~2500 gates + ~500 pins
q Placed in a 50x50 grid
q Graph shows “progress”

q Y axis: L = ∑nets Ni
HPWL(Ni)
q X axis: # swaps
q Yes, this ran for 8M
swaps until progress on
L stopped
# random gate swapsà
23
Random Iterative Improvement
q Easy to understand and to implement
q Start with a random placement. Randomly improve
until it stops getting better
q Can optimize what we care about: estimated
wirelength ∑nets Ni HPWL(Ni)
q It will improve. Sometimes, a lot.
q But… final result is still not very good
q We can do better. A lot (really, a lot) better…
24
Analytical Placer
q Goal: Write an equation whose minimum is the placement(!)
q If you have a million gates, need a million (xi,yi) values as
result
q Formulate an appropriate cost function for all the gate-level
(xi, yi):
F( x1, x2, … x1M, y1, y2, … y1M)
q …then solve analytically for X*=(x1, x2, … x1M), Y*=(y1, y2, …
y1M) to minimize F( )
q The resulting set of values of X*, Y* give you the placement
of all 1M gates
q This sounds difficult (?) but it works great

q All modern placers for big ASICs and SOCs are “analytical”
q Big trick is write the wirelength in mathematically “friendly”
form we can optimize
25
Optimize Quadratic Wirelength
q For 2-point net: squared length of “distance” line
between points
q Quadratic length = (x1-x2)2 + (y1-y2)2
A “2-point net” Quadratic wirelength

=(3-1)2 + (4-1)2
y=5 =13
1
4
3
2
1 BUT… what happens if your net
2 has more than 2 points in it?
0
x= 0 1 2 3 4
26
What about k > 2
q For k-point net:
q Replace one “real” net with k(k-1)/2 2-point nets
q Add a new net between every pair of points. Called a
fully-connected clique model
q (We use this model for all our subsequent examples)
A “k=4-point net” Clique model

y=5 y=5
4 4 4
1 k(k-1)/2
3 3 =4(4-1)/2
3
2 2 =6 2-point nets
1 1
2
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4
27
K>2 Note also: when k=2, this
weight is just (2-1)=1,
so no special treatment
q k-point net, continued for common 2-point nets
q And, each new 2-point net is weighted (multiplied) by

term: 1/(k-1)
q Why? 1 net became k(k-1)/2 nets. Need to compensate
so we don’t “overestimate”
Clique model
Quadratic estimate:
y=5 (1/3)[(4-1)2 + (5-4)2]
4 +(1/3)[(4-3)2 + (5-1)2]
3 +(1/3)[(3-1)2 + (4-1)2]
2
+(1/3)[(3-1)2 + (4-3)2]
+(1/3)[(4-3)2 + (5-3)2]
1
+(1/3)[(3-3)2 + (3-1)2]
0 =sum of 6 weighted
x= 0 1 2 3 4 2-point lengths
28
How do we Optimize this?
(1,1) 4(x2-1)2 + 4(y2-0.5)2

2 4
1 (1,0.5)
2
2(x2-x1)2 + 2(y2-y1)2
1
(0,0)
1(x1-0)2 + 1(y1-0)2
Add these together: this is the Quadratic Wirelength
So… how do we optimize this…?
29
Something you knew already
q Basic calculus! Differentiate, set derivative to 0,
then solve!
q But this is multiple variables? So, we do partial
derivatives, set each to 0, solve
Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2
∂Q/∂x1 = 0 + 4(x2-x1)(-1) + 2(x1) ∂Q/∂y1 = 0 + 4(y2-y1)(-1) + 2(y1)

= 6 x1 – 4 x2 = 0 = 4 y1 – 4 y2 - 8 = 0
∂Q/∂x2 = 8(x2-1) + 4(x2-x1) + 0 ∂Q/∂y2 = 8(y2-0.5) + 4(y2-y1) + 0

= -4 x1 + 12 x2 - 8 = 0 = -4 y1 + 12 y2 - 4 = 0
30
Matrix Form
q Hey – these are linear equations! We know how to solve these!
Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2

Minimize Minimize
6 -4 x1 0 6 -4 y1 0
x2 = =
-4 12 8 -4 12 y2 4
x1 = 0.571 x2 =0.857 y1 = 0.286 y2 =0.429
• Two matrix equations: Ax=bx and Ay=by. If you have N gates, matrix is NxN
• Same matrix for X,Y, different b vectors. X, Y, b vectors also have N elements
This is the heart of an analytical placement algorithm!

31
A Real Industrial Example
q This benchmark is from IBM

q 210,904 gates (blue)
q 543 fixed blocks (red) –like SRAMS
q Image shows where gates “want” to
go if we model blocks like “big pads”
q This is a quadratic placement of
gates
32
Partition-based Strategy
1st quadratic Partition chip into Repeat. Just keep going

place (QP) solve left / right. Partition each
Select which gates side top/bottom.
on each side. Solve 4 new
Solve 2 new, smaller smaller QP
QP tasks solves
33
State-of-the-art Placer example 1
q ePlace: Electrostatics Based Placement Using
Nesterov's Method
q By Jingwei Lu and Chung-Kuan Chen at UCSD
Cell layout animation of wirelength & density-driven placement progression of ISPD05

ADAPTEC1 with 221K nets, 63 fixed macros and 210K movable standard cells.
http://vlsi-cuda.ucsd.edu/~ljw/ePlace/ 34
State-of-the-art Placer example 2
q RePLAce: Advancing Solution Quality and Routability
Validation in Global Placement
q Ming Woo and AB K at UCSD
35
Summary
q Early placers based on iterative improvement
q Simulated annealing is a very good, famous example
q Annealing stuff used widely in VLSI CAD – but not for placers.
Too inefficient
q Modern placers are all analytical

q Many different mathematical formulations, but all similar
q Numerically optimize a mathematically friendly model of
wirelength
• Many modern placer can do the “legalization” at same time as
wirelength!
q Quadratic placement is a famous, important, practical
example
36

Lecture9 PhysicalSynthesis Placement

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture9 PhysicalSynthesis Placement

Uploaded by

Copyright:

Available Formats

Lecture 9: Physical Synthesis -

behavior HDL description gate-level

q What you don’t know...

Synthesis & mapping give a

Our job is to place them

q How to think about a standard cell

NAND2 Edge-triggered D Flip Flop

Our basic task:

Width of cells (=“how many I/O pins wide”)

q Is this hard? Yes!

connect all wires, or meet timing

q Very simple model of the chip

q What every real placer does: Minimizes expected wirelength

q Placer “solves” for gate locations to minimize this objective

q Always a lower bound on the real wire length

HPWL – old HPWL]

//calculate initial HPWL wirelength for whole netlist

//main improvement loop

y=5 Net i y=5 Net i

q Graph shows “progress”

q But… final result is still not very good

q We can do better. A lot (really, a lot) better…

q This sounds difficult (?) but it works great

A “2-point net” Quadratic wirelength

A “k=4-point net” Clique model

q And, each new 2-point net is weighted (multiplied) by

(1,1) 4(x2-1)2 + 4(y2-0.5)2

Add these together: this is the Quadratic Wirelength

So… how do we optimize this…?

Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2

∂Q/∂x1 = 0 + 4(x2-x1)(-1) + 2(x1) ∂Q/∂y1 = 0 + 4(y2-y1)(-1) + 2(y1)

∂Q/∂x2 = 8(x2-1) + 4(x2-x1) + 0 ∂Q/∂y2 = 8(y2-0.5) + 4(y2-y1) + 0

q Hey – these are linear equations! We know how to solve these!

Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2

x1 = 0.571 x2 =0.857 y1 = 0.286 y2 =0.429

This is the heart of an analytical placement algorithm!

q This benchmark is from IBM

1st quadratic Partition chip into Repeat. Just keep going

Cell layout animation of wirelength & density-driven placement progression of ISPD05

q Modern placers are all analytical

You might also like