You are on page 1of 36

Lecture 9: Physical Synthesis -

Placement

1
Digital Circuit Design Automation Flow (I)
q Logic-level design
functional logic circuit extract &
simulation simulation analysis verification

Functional /
System RTL & Logic Circuit
Architecture
Specification Synthesis Design
Design

module fz(a,b,c,d,e,Z);
input a,b,c,d;
output Z;
assign Z =
~(a|(b&C)|(d&e));
endmodule

behavior HDL description gate-level


representation representation
switch-level
representation
2
Digital Circuit Design Automation Flow (II)
q Layout-level design (a master piece of work …)
extract &
verification

GDSII
files
Physical
Fabrication Packaging
Synthesis

3
Physical Design Details
System Spec. Partition
Graph
Architecture
Module(a, b) Floorplan
Input a;
Output b; Graph
Function, logic
Placement
Circuit design Analytical

CTS
Physical design Tree

Signoff Routing
DRC, LVS Graph
Manufacturing Timing
Disk, legacy C codes
Testing (Linux LSF cluster)
22nm 10B+
Final chip NFS transistors
4
About Layout Design …
q What you know...
q Computational Boolean algebra, representation, some
verification, some synthesis
q This is what happens in the “front end” of the ASIC
design process
Synthesis, Connected gates + wires
High-level design
verification in your technology:
description
called a netlist

q What you don’t know...


q The “back end” problem -- turning these into IC masks
to build real chips
q Called “layout” or “physical design”
q Where we start: the placement problem
5
Placement for ASICs
q Focusing on the most common tasks in layout
q Row-based standard cell layouts for ASICs and SOCs

Synthesis & mapping give a


netlist of gates + wires

Our job is to place them


optimally in rows in
regions of the chip,
and to route the wires
to connect everything

6
Start with a Standard Cell Library
q A library of things called “Standard cells”
q A standard cell is a basic logic gate or small logic element,
e.g., a flip flop (FF)
q This is the set of allowed logic elements to build your chip.
Tiny example:
NOT NAND2, 3 NOR2, 3 1bit ADDER D FF

Q Q’
+ D

q Why?
q Lots of complicated electrical stuff going on at the transistor
level
q Each cell hides these electrical details, presents a simple,
geometric abstraction

7
How Big is the Library?
q Often, very big
q For all logic functions, input/output variants, timing variants,
electrical drive strengths…
Logic Drive
functions Fanin & Timing,
flip flop,
strength
~1000
X fanout
variants X scan X (1X, 2X
4X, 8X) = cells
variants variants

q How to think about a standard cell


q Simple abstraction of a geometric “container” for the circuits
you need to make logic.
q Inside the cell: Complex device & mask & electrical issues
q Outside the cell: A box with pins

8
Some Real Examples
q In older technology (130nm CMOS)
q Geometric fact #1: All have same height (so we can
arrange them in rows)
q Geometric fact #2: Cells can have different widths,
depending on circuit complexity

NAND2 Edge-triggered D Flip Flop

From: Graham Petley’s www.vlsitechnology.org site, which has open source cells
http://www.vlsitechnology.org/html/cells/wsclib013/lib_gif_index.html
9
Real Example: Place one Block on SoC
q Big SOCs often designed hierarchically, composed
from blocks which represent parts of overall system

Cells

Our basic task:


Place gates in rows
Route (connect) all gates with wires

10
Size Distribution of Standard Cells
q In real designs, small cells dominate, but lots of wide
cells too
q Old example: 200K gate IBM ASIC
• [J. Vygen, “Algorithms for Large-Scale Flat Placement, Proc.
ACM/IEEE Design Automation Conference, 1997].
q About size of small block on a big SOC (eg, 100M gates)

30000"
Number of cells
25000"

20000"

15000"

10000"

5000"

0"
1"
2"
3"
4"
5"
6"
7"
8"
9"

"

"
"

21 "
31 "
51 0"
10 00"

3"
"
"
"
"
"
"
"
10

18
19
20

0
11
12
13
14
15
16
17

,3
,5

12
,1

Width of cells (=“how many I/O pins wide”)


0,

11
Placement Problem
q What does a placer do?
q Input: Netlist of gates & nets Netlist…
q Output: Exact location of each gate
q Goal: Able to route (connect) all
Placement
wires
Cells

q Is this hard? Yes!


q Bad placement à Much more wire
q More wire: bigger, slower chip Routing
q If placement is very bad, next tool in
the flow—the router—unable to Cells

connect all wires, or meet timing

12
A Simple Placer Model

q Very simple model of the chip


q A simple grid (chessboard)
q Cells (gates) go in grid slots.
q Pins (input + output)
Any gate
can go in
q Very simple model of gates any slot
q All gates are exactly the same
size
q (Unrealistic, but simplifies things)
q Each grid slot can hold 1 gate

13
What does a Placer do?
q Placer optimizes the ability of router to connect all the nets
q But routers are computationally expensive tools. We can’t
run one “inside” placer
q We need a simplifying approximation

q What every real placer does: Minimizes expected wirelength


q For each wire in the design, estimate the expected length of
the routed wire
q To be more specific, we minimize this objective: ∑wires Wi
EstimatedLength(Wi)

q Placer “solves” for gate locations to minimize this objective

14
The Net
q Common term for a wire in a layout is: Net
q So, we call them “nets.” And the whole set of
gates+wires is “the netlist”
q Net categorized by how many things it connects. Call
them “points” for short
A “2-point net” A “4-point net”
y=5 y=5
4 4
3 3
2 2
1 1
0 0
x=0 1 2 3 4 x= 0 1 2 3 4

15
Wirelength Estimation
q Many different estimators, adapted to different
placer methods
q Why is this hard?
q Multi-point nets can be routed in many different paths.
q In a dense layout, nets do not all get routed in a
“shortest path.” Hard to predict.
y=5 y=5
4 4
3 3
2 2
1 1
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4 x= 0 1 2 3 4
16
Half-perimeter Wirelength (HPWL)
q Called Half-Perimeter Wirelength (HPWL), also
Bounding Box (BBOX)
q Put smallest “bounding” box around all gates.
q Assume gate lives in “center” of the grid slot.
q Add width (∆X) and height (∆Y) of the BBOX. That’s the
wirelength estimate.
A “2-point net” A “4-point net”
y=5 y=5
4 4
∆X=3-1=2; 3 3
∆X=4-1=3;
∆Y=4-1=3. 2 ∆Y=5-1=4. 2
HPWL=5 1 HPWL=7 1
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4 17
About the HPWL
q Easy to calculate, even for a multi-point net
q General formula:
q HPWL = [max{X coordinates of all gates) – min{X
coordinates of all gates}]
+ [max{Y coordinates of all gates) – min{Y coordinates
of all gates}]

q Always a lower bound on the real wire length


q No matter how complex the final routed wire path is…
q …you need at least this much wire to connect everything.
q Aside: all wiring on big chips (and most boards) is strictly
horizontal & vertical – no “arbitrary angles” for
manufacturing reasons. Another reason HPWL is good.

18
Distribution of HPWL
q Real distribution of HPWL for 200K gate IBM ASIC
[Vygen DAC97]
q 181K nets. Note LOG scale. Most nets are short. But
there is a long tail to distribution.

Number of nets
1000000
100000
10000
1000
100
10
1
HPWL
5

9. .0
10 0.0

15 5.0

0
0.

1.

1.

2.

2.

3.

3.

4.

4.

5.

6.

7.

8.

0.
9
1

-1

-2
0-

5-

0-

5-

0-

5-

0-

5-

0-

5-

0-

0-

0-

0-
0-
0.

0.

1.

1.

2.

2.

3.

3.

4.

4.

5.

6.

7.

8.

.0

.0
19
A Very Simple Placer
q 2 big ideas
q Start with a random placement
q Just randomly assign each gate to a grid location (only 1 gate per
grid, please!)
q Yes – this is a terrible placement. Really big L = ∑nets Ni HPWL(Ni)
q Random iterative improvement
q Pick a random pair of gates in the grid. Exchange their locations
(called a “swap”)
q Evaluate the change in L = ∑ HPWL(Ni); compute ΔL= [new
nets Ni

HPWL – old HPWL]


q If ΔL<0, new L got smaller: Good! Improvement! Keep this swap
q If ΔL>0, new L got bigger: Bad! Worse! Undo this swap
q Keep doing this for many, many random swaps, until L stops
improving

20
Random Iterative Improvement
//random initial placement
foreach( gate Gi in netlist )
place Gi in random location (x,y) in grid not already occupied

//calculate initial HPWL wirelength for whole netlist


L=0
foreach( net Ni in the netlist )
L = L + HPWL(Ni);

//main improvement loop


while ( overall HPWL wirelength L is improving ) {
pick random gate Gi; pick random gate Gj;
swap gates Gi and Gj
evaluate ΔL = new HPWL - old HPWL;
if ( ΔL < 0 ) { // improved placement! Update HPWL
L = L + ΔL
else // ΔL > 0, this is a worse placement
undo swap of Gi and Gj
}

21
Incrementally Update HPWL
q Incremental wirelength update calculation
q Cannot afford to re-compute length of each net in
entire placement after each swap!
q Most nets did not change! Do this incrementally--just
look at nets that could change
We only keep track of the min and max coordinates of x and y of the bounding box!

y=5 Net i y=5 Net i


4 4 Net k
1 3
3 Swap 2
2 Net j Net k Gates 1,2 2 Net j
1 1
2 1
0 0
x=0 1 2 3 4 x= 0 1 2 3 4
22
How Does This Work?
q It works ok….
q Small benchmark
q ~2500 gates + ~500 pins
q Placed in a 50x50 grid

q Graph shows “progress”


q Y axis: L = ∑nets Ni

HPWL(Ni)
q X axis: # swaps
q Yes, this ran for 8M
swaps until progress on
L stopped
# random gate swapsà

23
Random Iterative Improvement
q Easy to understand and to implement
q Start with a random placement. Randomly improve
until it stops getting better
q Can optimize what we care about: estimated
wirelength ∑nets Ni HPWL(Ni)
q It will improve. Sometimes, a lot.

q But… final result is still not very good

q We can do better. A lot (really, a lot) better…

24
Analytical Placer
q Goal: Write an equation whose minimum is the placement(!)
q If you have a million gates, need a million (xi,yi) values as
result
q Formulate an appropriate cost function for all the gate-level
(xi, yi):
F( x1, x2, … x1M, y1, y2, … y1M)
q …then solve analytically for X*=(x1, x2, … x1M), Y*=(y1, y2, …
y1M) to minimize F( )
q The resulting set of values of X*, Y* give you the placement
of all 1M gates

q This sounds difficult (?) but it works great


q All modern placers for big ASICs and SOCs are “analytical”
q Big trick is write the wirelength in mathematically “friendly”
form we can optimize

25
Optimize Quadratic Wirelength
q For 2-point net: squared length of “distance” line
between points
q Quadratic length = (x1-x2)2 + (y1-y2)2

A “2-point net” Quadratic wirelength


=(3-1)2 + (4-1)2
y=5 =13
1
4
3
2
1 BUT… what happens if your net
2 has more than 2 points in it?
0
x= 0 1 2 3 4

26
What about k > 2
q For k-point net:
q Replace one “real” net with k(k-1)/2 2-point nets
q Add a new net between every pair of points. Called a
fully-connected clique model
q (We use this model for all our subsequent examples)

A “k=4-point net” Clique model


y=5 y=5
4 4 4
1 k(k-1)/2
3 3 =4(4-1)/2
3
2 2 =6 2-point nets
1 1
2
0 0
x= 0 1 2 3 4 x= 0 1 2 3 4

27
K>2 Note also: when k=2, this
weight is just (2-1)=1,
so no special treatment
q k-point net, continued for common 2-point nets

q And, each new 2-point net is weighted (multiplied) by


term: 1/(k-1)
q Why? 1 net became k(k-1)/2 nets. Need to compensate
so we don’t “overestimate”
Clique model
Quadratic estimate:
y=5 (1/3)[(4-1)2 + (5-4)2]
4 +(1/3)[(4-3)2 + (5-1)2]
3 +(1/3)[(3-1)2 + (4-1)2]
2
+(1/3)[(3-1)2 + (4-3)2]
+(1/3)[(4-3)2 + (5-3)2]
1
+(1/3)[(3-3)2 + (3-1)2]
0 =sum of 6 weighted
x= 0 1 2 3 4 2-point lengths
28
How do we Optimize this?

(1,1) 4(x2-1)2 + 4(y2-0.5)2


2 4
1 (1,0.5)
2
2(x2-x1)2 + 2(y2-y1)2
1
(0,0)
1(x1-0)2 + 1(y1-0)2

Add these together: this is the Quadratic Wirelength

So… how do we optimize this…?

29
Something you knew already
q Basic calculus! Differentiate, set derivative to 0,
then solve!
q But this is multiple variables? So, we do partial
derivatives, set each to 0, solve

Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2

∂Q/∂x1 = 0 + 4(x2-x1)(-1) + 2(x1) ∂Q/∂y1 = 0 + 4(y2-y1)(-1) + 2(y1)


= 6 x1 – 4 x2 = 0 = 4 y1 – 4 y2 - 8 = 0

∂Q/∂x2 = 8(x2-1) + 4(x2-x1) + 0 ∂Q/∂y2 = 8(y2-0.5) + 4(y2-y1) + 0


= -4 x1 + 12 x2 - 8 = 0 = -4 y1 + 12 y2 - 4 = 0

30
Matrix Form

q Hey – these are linear equations! We know how to solve these!

Q(X): 4(x2-1)2 + 2(x2-x1)2 + 1(x1-0)2 Q(Y): 4(y2-0.5)2 + 2(y2-y1)2 + 1(y1-0)2


Minimize Minimize

6 -4 x1 0 6 -4 y1 0
x2 = =
-4 12 8 -4 12 y2 4

x1 = 0.571 x2 =0.857 y1 = 0.286 y2 =0.429

• Two matrix equations: Ax=bx and Ay=by. If you have N gates, matrix is NxN
• Same matrix for X,Y, different b vectors. X, Y, b vectors also have N elements

This is the heart of an analytical placement algorithm!


31
A Real Industrial Example

q This benchmark is from IBM


q 210,904 gates (blue)
q 543 fixed blocks (red) –like SRAMS
q Image shows where gates “want” to
go if we model blocks like “big pads”
q This is a quadratic placement of
gates

32
Partition-based Strategy

1st quadratic Partition chip into Repeat. Just keep going


place (QP) solve left / right. Partition each
Select which gates side top/bottom.
on each side. Solve 4 new
Solve 2 new, smaller smaller QP
QP tasks solves

33
State-of-the-art Placer example 1
q ePlace: Electrostatics Based Placement Using
Nesterov's Method
q By Jingwei Lu and Chung-Kuan Chen at UCSD

Cell layout animation of wirelength & density-driven placement progression of ISPD05


ADAPTEC1 with 221K nets, 63 fixed macros and 210K movable standard cells.
http://vlsi-cuda.ucsd.edu/~ljw/ePlace/ 34
State-of-the-art Placer example 2
q RePLAce: Advancing Solution Quality and Routability
Validation in Global Placement
q Ming Woo and AB K at UCSD

35
Summary
q Early placers based on iterative improvement
q Simulated annealing is a very good, famous example
q Annealing stuff used widely in VLSI CAD – but not for placers.
Too inefficient

q Modern placers are all analytical


q Many different mathematical formulations, but all similar
q Numerically optimize a mathematically friendly model of
wirelength
• Many modern placer can do the “legalization” at same time as
wirelength!
q Quadratic placement is a famous, important, practical
example

36

You might also like