You are on page 1of 7

A Linear-Time Heuristic for Improving Network Partitions

C.M. Fiduccia and R.M. Mattheyses


General Electric
Research and Development Center
Schenectady, NY 12301

tition to the other, in a attempt to


l&strati reduce the number of nets which have cells
in both blocks. This idea has been
An iterative mincut heuristic for parti- independently applied by Shiraishi and
tioning networks is presented whose worst
case computation time, per pass, grows Hirose 5 . A technique due to Kernighan and
linearly with the size of the network. In Lin3 is used to reduce the chance that the
practice, only a very small number of minimization process becomes trapped at
passes are typically needed, leading to a local minima. Our main contribution con-
fast approximation algorithm for mincut sists of an analysis of the effects a cell
partitioning. To deal with cells of vari- move has on its neighboring cells and a
ous sizes, the algorithm progresses by subsequent efficient implementation of
moving one cell at a time between the these results.
blocks of the partition while maintaining
a desired balance based on the size of the After specifying the network parti-
blocks rather than the number of cells per tioning problem, we discuss the Kernighan
block. Efficient data structures are used
to avoid unnecessary searching for the and Lin3 heuristic and introduce the basic
best cell to move and to minimize unneces- concept of cell OUJI which is used to
sary updating of cells affected by each select the cell to be moved from one block
move. of the partition to the other. The pro-
perties of gain are then exploited to con-
struct a data structure that allows effi-
cient management of changing cell gains.
Introduction We then address the problem of achieving a
Given a network consisting of a set of desired h8;lance between the sizes of the
cells (modules) connected by a set of nets two blocks of the partition in an environ-
(signals), the mincut partitioning problem ment which allows for differing cell
consists of finding a partition of the set sizes. The problem of determining which
of cells into two blocks A and B such that cells have their gains affected by each
the number of nets which have cells in move is then addressed. In both cases,
both blocks in minimal. In general, this the total amount of work required, per
process is subject to a balancing condi- pass, is shown to grow linearly with the
tion which admits only those partitions size of the network. We close with a dis-
whose blocks satisfy a user specified cri- cussion of the behavior of a VAX-based
terion based on size or cardinality con- implementation of the algorithm by giving
straints. the results and the execution times
encountered when the program was run on
An exact solution to this problem is several examples.
currently intractable in the sense that no
polynomial-time algorithm for it is known
to exist. Since in practice the network The l?x&kn
may be very large, a practical algorithm
must of necessity employ heuristics which Following Schweikert and Kernighan4
exhibit nearly linear running times. This view a network as a set of C cel2
problem has been treated by a number of (modules) cell(l) ,...,cell(C) connected by
set of N nets (signals)
researcherslm5 over the last decade. We Zet(1) ,...,net(N). AS far as partitioning
present an iterative algorithm whose worst is concerned, we may without loss of gen-
case running time, per pass, grows erality make the assumptions listed below
linearly with the size of the network, and about what comprises a network. We assume
which in practice typically converges in that a ti is defined as a set consisting
several passes. This linear-time behavior of at least two cells, and that each cell
is achieved by a process of moving one is contained in at least in one net. The
cell at a time, from one block of the par- number of cells in net(i) will be denoted

19th Design Automation Conference

0 ACM 241
by n(i). Any two cells which share at & if it has at least one cell in each
least one net are said to be neighbors block and uncllt otherwise. Call this the
Each ceil is assumed to have a a s(ij cvtstate of the net. This state may be
and a number of pins p(i), indicating that deduced from the net's distribution, this
it belongs to exactly that many nets. being the number of cells it has in blocks
These assumptions are easily established A and B respectively. Define the mtset
by the input routine. For input, we assume of the partition to be the set of all nets
that the nets are presented one at a time, which are cut. Finally, define the &~.zZ
in any order; each net being completely 1X( of a block of cells X to be the sum of
given before another net is started. the sizes s(i) of its constituent cells.
Since each pin is on one and only one net,
the total number of pins p(l) + ... + Given a fraction (ratio) O<r<l, we
p(C), call it P, may be taken as the wish to partition the network into two
"length" of the input and, hence, as the blocks A and B such that lAi/(tAt+tBt)zr,
"size" of the network. It is clear that and such that the size (cardinality) of
neither C nor N will serve this purpose, the resulting cutset is minimized. The
since neither the number of pins per cell ratio r is only intended to capture the
p(i) nor the number of cells per net n(i) balance criterion of the final partition
is bounded. In any event, both C and N produced by the algorithm. This should
are O(P). not be taken to mean that each move must
maintain balance (although this is cer-
The following input routine will deal tainly not ruled out) nor that, in partic-
with real networks, whose nets are often ular, the initial partition need be
given as lists of (cell, pin) pairs, which balanced. We will discuss this point in
violate some of the above assumptions con- more detail later. In addition to speci-
cerning what constitutes a net. Nets are fying the ratio r and an initial partition
sequentially numbered 1,2,...N as they are (with one of A or B possibly empty), the
encountered in the input stream. Cells user is allowed to designate certain cells
are assumed to be identified by integers as being "fixed" in either block A or
in the range 1,2,...C. The principal block B of the partition. This allows the
function performed by the routine is to algorithm to be used to further refine
construct two data structures from the blocks created by previous partitions.
sequence of nets given as input. The
first structure is a CELL array, which for
each cell contains a linked list of the The Basic
nets that contain the cell. The second
structure is a NET array, which for each Given a partition (A,B) of the cells, the
net contains a linked list of the cells on main idea of the algorithm is to move a
the net. In both cases, each linked list cell at a time from one block of the par-
created is regarded as a set, with no tition to the other in an attempt to
duplicates and no implicit order. Each minimize the cutset of the final parti-
record in each of the arrays also contains tion. The cell to be moved, call it the
several additional fields which the algo- base cell, is chosen both on the basis of
rithm uses to perform its function. the balance criterion and its effect on
the size of the current cutset. Define
/* net-list input routine */ the 9ajLn g(i) of cell(i) as the number of
FOR each net n = 1 . . . N DO nets by which the cutset would decrease
FOR each (cell, pin) pair were cell(i) to be moved from its current
(i,j) on net n DO block to its complimentary block. Note
/* maintain set property */ that a cell's gain may be negative.
IF net n is not at the front of Indeed, g(i) must be an integer in the
the net-list for cell i range -p(i) to +p(i). It is also clear
THEN insert cell i into the that during each move we must keep in mind
cell-list of net n and the balance criterion to prevent all cells
insert net n into the from migrating to one block of the parti-
net-list of cell i tion. For surely that would be the best
END FOR partition were balance to be ignored.
END FOR Thus the balance criterion is used to
select the block from which a cell of
One should also delete nets with only one highest gain is to be moved. It will
cell and a cells that may no longer be on often be the case that this cell has a
any of the resulting nets. It is clear non-positive gain. In that case, we still
that O(P) time will suffice to do all of move the cell with the expectation that
the above work, provided that the number the move will allow the algorithm to
of (cell, pin) pairs in the input stream "climb out of local minima". After all
is O(P). moves have been made, the best partition
encountered during the pass is taken as
Given any partition of the cells into the output of the pass. This minimization
two blocks A and B, a net is said to be technique is due to Kernighan and Lin3.

242
To prevent the cell-moving process
from "thrashing" or going into an infinite
loop, each base cell ' immediately For any partition (A,B) we have defined
"locked" in its new byock for the the gain g(i) of cell(i) as the number of
remainder of the pass. Thus only "free" nets by which the cutset would decrease,
cells are actuallv allowed to make one were cell(i) to be moved from its current
move during a pass, until either all cells block to its complimentary block.
become locked or the balancing criterion
prevents further moves. The best parti-
tion encountered during the pass is then
returned. Additional passes may then be
performed until no further improvements
are obtained. In practice this typically
occurs quickly, in several passes, result-
ing in a nearly linear algorithm; however,
make claims about the number of
i:sses reqnuoired in the worst case, except
to point out the obvious fact that, only Figure 1. Example of cell gains
O(N) passes are possible since the cutset
is bounded by the number of nets. Clearly, g(i) is an integer in the range
-p(i) to + p(i), so that each cell has its
The bulk of the work needed to make a gain in the range -pmax to +pmax, where
move consists of selecting the base cell, pmax=max{p(i) Icell is initially free).
moving it, and then adjusting the gains of In view of the restricted set of values
its free neighbors. Unless this is care- which cell gains may take on, we can use
fully done, each cell will have its gain "bucket" sorting to maintain a sorted list
recomputed each time one of its neighbors of cell gains. This is done using an
moves. This is definitely not necessary. array BUCKET[-pmax .,. pmaxl , whose kth
The naive approach will lead to an algo- entry contains a doubly-linked list of
rithm which performs (n(i))2+...+(n(i))2 = free cells with gains currently equal to
k. Two such arrays are needed, one for
O(P2) gain computations per pass. This block A and one for block B. Each array
stems from the fact that the neighborhood is maintained by quickly moving a cell to
relation induced by a net containing n the appropriate bucket whenever its gain
cells is a complete graph with O(n2) changes due to the movement of one of its
edges. Since a single gain computation neighbors. Direct access to each cell,
;zzka cefl with p(i) pins takes O(p(i)) from a separate field in the CELL array,
I this approach to maintaining cell allows us to yank a cell from its current
list and move it to the head of its new
gains will require more than O(P2) work. bucket list in constant time. Because
This is particularly expensive even when only free cells are allowed to move, only
one large net exists. they need to have their gains updated.
Whenever a base cell is moved, it is
We solve the first problem, that of "locked", removed from its bucket list,
selecting a base cell having the largest and placed on a "FREE CELL LIST" which is
gain in its block, by the use of a data later used to reinitialize the BUCKET
structure which quickly returns a cell of array for the next pass. This "FREE CELL
highest gain and allows recomputed cell LIST" saves a ureat deal of work when a
gains to be reentered into the structure large number of-cells have permanent block
in Constant time. We consider the solu- assignments and are thus not free to move.
tion to this problem in the next section
where we discuss the notion of cell gain.
The second problem, that of updating
+ pmax n
the gains
cell,
of the neighbors
is much more interesting.
of the base
The naive
H
algorithm consists of recomputing the gain
of every free cell on every net of the
base cell. We avoid these time consuming
pitfalls by showing that a net(i) never
accounts for more than 2n(i) gain recompu-
tations during one entire pass. Moreover, - pmax
we show that each gain recomputation can
be replaced by an appropriate sequence of
simple gain increment/decrements which can CELL
be done in constant time. These solutions 1 2 a" C
to the two problems reduce the total work
required to perform one pass to O(P) in Figure 2. Bucket list structure
the worst case.

243
For each BUCKET array, a MAXGAIN index is Call a partition (A,B) balanced pro-
maintained which is used to keep track of vided that
the bucket having a cell of highest gain.
This index is updated by decrementing it rW - smax I IAl I rW + Smax
whenever its bucket is found to be empty
and resetting it to a higher bucket when- where W = IA1 + IBI is the sum of the
ever a cell moves to a bucket above MAX- s(i), and smax = max(s(i)] is the size of
GAIN. Experience with integrated circuit the largest cell which is initially free.
networks shows that gains tend to cluster A special initial pass is used to estab-
sharply around the origin and that MAXGAIN lish the balance by moving cells to or
moves very little indeed, making the above from block A depending on the sizes of
implementation exceptionally fast and sim- blocks A and B and the desired ratio r.
ple. We now establish that, despite its During this pass, as in all other passes,
simplicity, this scheme actually does only the base cell is selected according to the
linear work per pass. highest gain criterion. Once balance is
achieved, it is possible to maintain it
ion L The total amount of work with every move because the tolerance
required to maintain each BUCKET array is always allows at least one free cell from
O(P) per pass. either A or B to be moved. If desired, a
tolerance of *k*smax may be used, where
Proof. Let f = O(P) be the number of k = k(s) 11 is some slowly growing func-
cells in the network which are initially tion of the number of free cells in the
free. Initialization requires O(pmax) + network.
O(f) = O(P) time. If g is the total
number of gain adjustments performed dur- Having established balance, the basic
ing one pass, then O(g) work is sufficient idea of repeatedly choosing a base cell to
to move all free cells to their appropri- be moved is described as follows:
ate bucket lists, since each cell can be
moved in constant time. In the section on 1. Consider the first cell (if any) of
maintaining cell gains, we establish that highest gain from each BUCKET array,
g = O(P). We must finally account for the rejecting it if moving it would cause
work required to return a cell of highest imbalance. If neither block has a
gain when one is requested. Let R be the qualifying cell, no more moves will
sum of all the amounts by which MAXGAIN is be attempted.
reset by all the various reset actions.
Although we cannot in general search and 2. Among those cells returned in step
return a cell in constant time, the total one, choose a cell of highest gain,
time, per pass, used to search down for a breaking ties by choosing the one
non-empty bucket and to return and remove which gives the best balance. Break
a cell of highest gain is O(R + pmax) + remaining ties as desired.
O(f) = O(R) + O(P). In the next section
mz;how that R =.0(g); so. that O(P).tctal 3. Return this as the base cell; remove
, Per wssr 1s sufficient to initial- it from its bucket list: and place it
ize and maintain the bucket lists. QED on the FREE CELL LIST.
Having chosen a base cell, we now move it
to its complimentary block: lock it; and
determine the effects it produces on the
The concept of mincut partitioning is distributions of its nets and on the gains
meaningless unless a restriction is placed of its neighboring cells. Unless this is
on the sizes of the two blocks; otherwise, done carefully, the resulting time, per
we could achieve an empty cutset by moving pass, will be worse than O(P2). We next
all of the cells to one block of the par- show how to do this in linear time per
tition. The approach we have taken is to pass.
specify a fraction (ratio), 0 < r < 1, to
suggest that only final partitions satis-
fying IAl/(lAl + IBI) e r are acceptable.
equality cannot be o and -a . . .
Cew
Since in general
achieved, some notion of an acceptable We have yet to describe how to compute and
tolerance must be incorporated into the maintain cell gains. To do this, we must
balancing scheme. We have considered intrclduce the notion of a critical net.
several approaches, including the use of Consider an arbitrary net n. Given a par-
cost functions based on the size of the tition (A,B), define the distribution of
cutset and the amount by which the parti- net n, relative to this partition, as an
tion deviates from the desired ratio r. ordered pair of integers (A(n),P(n)) which
We are currently using a scheme which is represents the number of cells the net n
both fast and seems to work well when the has in blocks A and B respectively. These
variance in cell sizes is not too large. are clearly computable in O(P) time for
all nets. Recalling the definition of the

244
cutstate of a net, we say that a net is the inner loop scans through each of the
m . . if there exists a cell on it cell's nets and performs a simple incre-
which if moved would change the net's cut- ment or decrement operation. Thus the
state. It is easy to see that n is criti- total work involved is O(rp) = O(P), where
cal iff: either A(n) or B(n) is equal to 0 rp is the number of pins reachable from
or 1. all the free cells. QED
Next we prove that a linear amount of
time is sufficient to maintain the gains
of all free cells during a single pass of
the algorithm. Since a net is critical if
and only if it contains a cell which if
A(n) = 1 I A(n) = 0 moved would alter the cutstate of the net,
we need look at only those nets, connected
to the base cell, that are critical before
or after the move. Only nets consisting
of either two or three cells can be criti-
cal both before and after a move. For
B(n)' = 1 B(n) = 0 I such nets, two gain adjustment actions
might be required: two-cell nets will have
Figure 3. Critical nets one cell incremenetd or decremented twice,
whereas three-cell nets will have one cell
It is now clear that the gain of a cell, incremented and another cell decremented.
previously defined in terms of its effect
on the cutset,
cal nets.
depends only on its criti-
This means that if the net is I
not critical, its cutstate cannot be
affected
tant,
before
by a move. What is more impor-
a net which is not critical
or after a move cannot
either
possibly
L!T-a
+l I
I
+1

influence the gains of any of its cells.


This observation, coupled with the fact I
that base cells
moved, will
time claim.
are "locked" after
form the basis of our linear-
being
tI -1
I Aa
-1
Let F ("From") be the current block
of cell(i) and T ("TO") be its complimen- Figure 4. Nets requiring 2 adjustments
tary block; so that F=A and T=B or vice-
versa. The gain of cell(i) is then given If a net is critical, either before or
by after a move, the contributions it makes
to the gains of its cells need to be
g(i) = FS(i) - TE(i), adjusted; Of course, this should only be
done if the net's distribution is chansed
where FS(i) is the number on nets which by the move; that is, only for nets on the
have cell(i) as their only F cell, and base cell. Using the "from-to" terminol-
TE(i) is the number of nets which contain ogy of the gain computation algorithm, we
cell(i) and have an empty T side. Thus a see that a net is critical before the move
critical net on cell(i) contributes +l OK iff
-1 to g(i). The following algorithm com-
putes the initial gains of all free cells. F(n) = 1 or T(n) = 0 or T(n) = 1.
/* compute cell gains */ The case F(n) = 0 can not occur because
FOR each free cell i DO the base cell is on the F side before the
z(i) -+ 0 move. Similarly, a net is critical after
f- the "from block" of cell(i) a move iff
T f- the "to block" of cell(i)
FOR each net n on cell i DO T(n) = 1 or F(n) = 0 or F(n) = 1.
IF F(n) = 1 THEN increment g(i)
IF T(n) = 0 THEN then decrement g(i) To simplify the situation, we further note
END FOR that F(n) = 1 before the move iff F(n) = 0
END FOR after the move, and that T(n) = 1 after
. . the move iff T(n) = 0 before the move.
Ero~osltlon Initialization of all cell The following code checks for each of
gains require; O(P) work. these four cases to see if gain updates
are required. A careful analysis of the
Proof. Making use of the FREE CELL LIST, four cases, which are not independent,
the outer loop scans through the free will assure the reader that the correct
cells in the network. For each free cell, updates are applied.

245
/* move base cell and update /* check for critical nets
neighbors' gains */ before the move */
F t the "from block" of base cell IF LT(n) = 0
T t the "to block" of base cell THEN IF FT(n) = 0 THEN "update gains"
Lock the base cell and ELSE IF FT(n) = 1 THEN "update gains"
Compliment its block /* change the net distribution
FOR each net n on the base cell DO to reflect the move */
/* check critical nets decrement FF(n)
before the move */ increment LT(n)
IF T(n) = 0 THEN increment gains of /* check for critic!al nets
all free cells on after the move */
net(n) IF LF(n) = 0
ELSE IF T(n) = 1 THEN decrement gain THEN IF FF(n) = 0 THEN "update gains"
of the only T cell on ELSE IF FF(n) = 1 THEN "update gains"
_ . . if it is free
net(n),
/* change the net distribution Observe that once both blocks A and B have
to reflect the move */ served in the capacity of the T side for a
decrement F(n) given net n, no further update operations
increment T(n) will occur for that net. This is because
/* check critical nets the code which updates the net's distribu-
after the move */ tion will have incremented the locked cell
IF F(n) = 0 THEN decrement gains of count on both sides. Once this occurs,
all free cells on the net is essentially "dead", meaning
net(n) that its cutstate can no longer change,
ELSE IF F(n) = 1 THEN increment gain thus ruling out the possibility of future
of the only F cell on updates.
net(n) , if it is free
END FOR This observation allows us to concen-
trate on only that portion of the move
The action of incrementing or decrementing sequence, for an individual net n, which
the gains of a specific subset of the includes the first change in direction of
cells, on a net consisting of n cells, cell movement. We will consider a
requires at most O(n) work because, in one sequence of moves (with respect to the net
scan of the net, each cell can be reached n) of cells from the A side (A-move) fol-
from the net's cell list and can be moved lowed by a single move of a cell from the
from one bucket to another in constant B side. During the first A-move T=B, thus
time. We shall refer to one scan of a for all subsequent moves LB(n) will be
net's cell list as an update operation. positive. Therefore, the B side, having
only 0 or 1 cells, can only cause an
No more than four update update on the first A-move of the
net are performed during sequence. During the sequence of A-moves,
one pass of the algorithm, each move causes the FA(n) component of
the net distribution to be decremented by
EXQQX. We first transform the inner loop one. Updates can occur only for values of
of the gain update algorithm to simplify FA(n) = 1 and FA(n) = 0, and only once for
the discussion. To do this, we need to each value with F=A. The final move with
distinguish between the free and locked B=F could also cause an update if the A
cells of net(n) in each block of the par- side has 1 or 0 cells. Since no further
tition. Let LF(n) and FF(n) respectively updates can be required, we get a total of
refer to the number of locked and free at most four updates per net. A more
cells net(n) has on the F side of the par- careful analysis reveals that three
tition. A similar notation is used for updates will be sufficient for any net,
the T side. Concentrating on the first and that three updates are necessary for
conditional in the loop body, notice that certain nets. During these three updates,
T(n) = 0 requires that LT(n) = FT(n) = 0. the gain q(i) of a given cell(i) is
The condition T(n) = 1 requires that adjusted at most twice. QED
either LT(n) = 1 and FT(n) = 0, or that
LT(nl = 0 and FT(n) = 1; however, the Using facts from the previous proof,
update is performed only if the cell on we can now complete the proof of Proposi-
the T side is free; that is, only if LT(n) tion 1. We see that q, the total number
= 0. Using this observation, and a similar of gain adjustments per pass is O(f),
observation for the conditional updates where f is the number of initially free
after the distribution shift, the code for cells. Thus g = O(f) = O(P) in Proposi-
the inner loop of the gain adjustment tion 1. Each time a net is updated, the
algorithm can be restated as: gain of any cell on that net can be incre-
mented at most twice, by Proposition 3;
thus, during one update, the value of MAX-
GAIN can be reset to at most MAXGAIN + 2.

246
This shows that R, in Proposition 1, is quarter, and so on. We feel that this is
0 (N) = O(P). This establishing that the a novel approach to intra channel place-
bucket lists can be maintained with O(P) ment.
work per pass. QED
We are now in a position to establish
the behavior of the our algorithm for The authors wish to thank Bob Darrow, who
maintaining cell gains. implemented the algorithms on the VAX.
. . Without the feedback one gets from such
ProDoslt~on 4 The total work required to implementations, it is difficult to evalu-
initialize and maintain cell gains is O(P) ate a heuristic.solution. Thanks are also
per pass. due to Phil Lewis and Ron Rivest for their
suggestions.
Proof. The total amount of work required
for gain maintenance during one pass of
the algorithm is the sum of the work
required for each individual net. Each
update of net(i) uses O(n(i)) work. Pro- [l] M.A. Breuer, 'Min-Cut Placement," J.
position 3 shows that only a constant of Design and Fault- Tolerant Com-
number of updates are required, per net puting, Vol.1, number 4, Oct. 1977,
per pass; Since n(l) +...+n(N) = O(P), the pp. 343-362.
linear behavior is obtained. QED
[2] M.A. Breuer, "A ClaSS of Min-Cut
Combining Propositions 1 and 4, we
may now state our main result. Placement Algorithms," Proc. 14th
Design Automation Conference, New
Theorem. The minimization algorithm Orleans, 1977, pp. 284-290.
requires O(P) time to complete one pass.
[3] B.W. Kernighan and S. Lin, nAn Effi-
cient Heuristic Procedure for Parti-
tioning Graphs," Bell System Techni-
cal Journal, Vol. 49, Feb. 1970, pp.
The algorithm has been implemented in the 291-307.
language C, and runs on a VAX 11/788. Its
performance was evaluated by using it to [4] D.G. Schweikert and B.W. Kernighan,
partition several random-logic polycell "A Proper Model for the Partitioning
designs.. Four samples are listed below.
The average chip has 267 cells, 245 nets, of Electrical Circuits," Proc. 9th
and 2650 pins. On these chips, the algo- Design Automation Workshop, Dallas,
rithm typically makes about 900 moves per June 1979, pp. 57-62.
cpu-second. This will of course depend on
the average number of pins per cell and 151 H. Shiraishi and F. Hirose, "Effi-
the sizes of the nets. The factor by cient Placement and Routing for
which the algorithm will outperform the Masterslice LSI," ProC. 17th Design
naive algorithm depends on network size Automation Conference, Minneapolis,
and especially on the size of the largest June 1980, pp. 458-464.
nets. The new algorithm is superior espe-
cially when the network contains even one
large net.
CELLS NETS PINS PASSES TIME
Chip 1 306 300 857 3 1.63
Chip 2 296 238 672 2 .98
Chip 3 214 222 550 5 1.91
Chip 4 255 221 571 5 2.09
As a cell placement tool, in a polycell
environment, the algorithm is being
evaluated in two quite distinct ways. The
first is a straight-forward application to
partition the cells into channels. We
call this u-channel placement. Its
objective is to reduce the number of
inter-channel connections needed. The
second application is as an ~-&uu.&I,
placement tool. Here the objective is to
reduce channel density and wire length.
This is done recursively to determine
first, in which half of the channel the
cell should be placed, then in which

247

You might also like